[whatwg] accessibility management for timed media elements, proposal

Sun Jun 10 04:05:55 PDT 2007

Dave Singer wrote:

> At 16:35  +0100 9/06/07, Benjamin Hawkes-Lewis wrote:

[snip]

>> The proposal does not describe how conflicts such as the following
>>  would be resolved:
>> 
>> User specifies:
>> 
>> captions: want high-contrast-video: want
>> 
>> Author codes:
>> 
>> <video ... > <source media="all and (captions:
>> want;high-contrast-video: dont-want)" ... /> <source media="all and
>> (captions: dont-want;high-contrast-video: want)" ... /> </video>
> 
> There is no suitable source here;  it's best to have something (late)
> in the list which is less restrictive.

But if UAs can apply accessibility preferences to a catch-all <source>
listed last, then what's the advantage of creating multiple <source>
elements in the first place? Current container formats can
include captions and audio descriptions. So is the problem we're trying
to solve that container formats don't contain provision for alternate
visual versions (high contrast and not high contrast)? Or are we trying 
to cut down on bandwidth wastage by providing videos containing only the 
information the end-user wants?

>> a) I should think sign-language interpretation needs to be in
>> there.
>> 
>> sign-interpretation: want | dont-want | either (default: want)
>> 
>> Unless we want to treat sign interpretation as a special form of 
>> subtitling. How is subtitling in various languages to be handled?
> 
> I think we assume that a language attribute can also be specified, as
>  today.

The lang attribute specifies "the primary language for the element's 
contents and for any of the element's attributes that contain text", not 
the referenced resource. hreflang "gives the language of the linked 
resource" as a single "valid RFC 3066 language code." So we'd need a new 
attribute or to change the content model of hreflang to explicitly 
specify the separate multiple languages of a resource.

http://www.whatwg.org/specs/web-apps/current-work/multipage/section-global.html#the-lang

http://www.whatwg.org/specs/web-apps/current-work/multipage/section-links.html#hreflang3

I note in passing that these attributes should be updated to use RFC 
4646 not RFC 3066 as per:

http://www.w3.org/TR/i18n-html-tech-lang/#ri20030112.224623362

> I have to confess I saw the BBC story about sign-language soon after
> sending this round internally.  But I need to do some study on the 
> naming of sign languages and whether they have ISO codes.  Is it true
> that if I say that the human language is ISO 639-2 code XXX, and
> that it's signed, there is only one choice for what the sign language
> is (I don't think so -- isn't american sign language different from
> british)? Alternatively, are there ISO or IETF codes for sign
> languages themselves?

Brian Campbell has eloquently answered some of these questions.

The reason I was thinking of using a CSS property was that signed 
interpretation is not the same as signing featured in the original 
video. But it's true that information about what sign languages are 
available is important, so a CSS property alone wouldn't solve the 
problem. Maybe we need new attributes to crack this nut:

<source contentlangs="en,sgn-en" captionlangs="sgn-en-sgnw,fr,de,it,sgn" 
dubbinglangs="fr" subtitlelangs="de,it" 
signedinterpretationlangs="sgn-en,sgn-fr,sgn-de,sgn-it" ...>

This would indicate that the main video content features people talking 
in English and people signing in English; the video is captioned in 
English, French, German, Italian, and their SignWriting analogues 
(American Sign Language in the case of English), dubbed in French, 
subtitled in German and Italian, and provided with signed interpretation 
in American, French, German and Italian Sign Languages.

Granted it's a sledgehammer, but it does provide the fine-grained 
linguistic information we need. It would also seemingly remove the need 
for putting a caption media query on <source>. While this markup looks 
complicated, most videos currently on the web could be marked up like:

<source hreflang="en" ...>

as all they provide is a single-language spoken track.

<digression>

I should add a little note about "sgn-en-sgnw". The IANA language tag 
registry includes the following entry:

> Type: script
> Subtag: Sgnw
> Description: SignWriting
> Added: 2006-10-17

http://www.iana.org/assignments/language-subtag-registry

One might want to omit the sgnw subtag on the basis that other sign 
language transliterations are academic not everyday (just as one omits 
the latn subtag for en, fr, and so on). However, those who work on such 
things have yet to come up with an entirely settled formulation. See 
this thread on the IETF languages mailing list:

http://www.alvestrand.no/pipermail/ietf-languages/2006-October/005126.html

Meanwhile, people are already creating SignWriting captions:

http://www.webcitation.org/5PUMLS0mp

</digression>

>> b) Would full descriptive transcriptions (e.g. for the deafblind)
>> fit into this media feature-based scheme or not?
>> 
>> transcription: want | dont-want | either (default: either)
> 
> how are these presented to a deafblind user?

Depends. I think the ideal would be to have transcriptions inside a 
container format, so that /everyone/ could access them and so that 
deafbind people who still have some sight can see some of the video. The 
transcriptions could be dispatched to a braille display. And, yeah, with 
my sledgehammer system that would necessitate yet another language 
attribute to indicate what languages transcriptions are provided in.

The crudest way of doing this would be to provide transcriptions of 
audio descriptions to supplement the captions. I believe one can do that 
with SMIL; I don't know what the situation with other container formats 
or player UIs is however.

Alternatively and suboptimally:

<source src="mytranscription.html" hreflang="en" type="text/html" 
media="all and (transcription:desired)>

(Here I'm leaning towards the ideal of catering to all comers with a 
single container. But sometimes end-users prefer a simple format, e.g. 
some PDF haters prefer to be given a version in plain text. This tells 
you more about broken authoring practices and UAs, and about the digital 
divide between broadband and dialup, than about intrinsic problems with 
PDF however. Still, if it's possible to provide plain text and plain 
HTML versions of whatever content you're producing, that's a good idea 
as a supplement, not just a fallback.)

>> c) How about screening out visual content dangerous to those with 
>> photosensitive epilepsy, an problem that has just made headlines in
>>  the UK:
>> 
>> http://news.bbc.co.uk/2/hi/uk_news/england/london/6724245.stm
>> 
>> Perhaps:
>> 
>> max-flashes-per-second: <integer> | any (default: 3)
>> 
>> Where the UA must not show visual content if the user is selecting
>> for a lower number of flashes per second. By default UAs should be
>>  configured not to display content which breaches safety levels;
>> the default value should be 3 /not/ any.
> 
> I think we'd prefer not to get into quantitative measures here, but a
>  boolean "this program is unsuitable for those prone to epilepsy
> induced by flashing lights" might make sense.  epilepsy: dont-want
> -:)

Why? The reason I suggested a quantitive measure was to ensure that 
encoded information remains relevant even as our medical understanding 
of the condition changes. WCAG 1.0 stipulated that 4 flashes per second 
would be dangerous, but WCAG 2.0 stipulates that more than 3 flashes per 
second would be dangerous:

http://www.w3.org/TR/WAI-WEBCONTENT/#tech-avoid-flicker

http://www.w3.org/TR/2007/WD-WCAG20-TECHS-20070517/Overview.html#G19

So 3.1 flashes would pass WCAG 1.0 but flunk WCAG 2.0. (Thinking about 
it, this probably indicates the value should be decimal not integer.)

I can see there's an ease-of-use issue for authors who know their 
content contains lots of flashes but haven't actually tested it to find 
out the flash rate. But if UAs default to 3 as I emphasized they 
/should/ do, such authors can use the "any" value to avoid having to 
quantify the number of flashes. If in the future we decide 2 flashes 
should be the threshold or add further criteria, contemporary UAs could 
be reconfigured and future UAs could be revised, but the content would 
go on working.

>> d) Facilitating people with cognitive disabilities within a media 
>> query framework is trickier. Some might prefer content which has
>> been stripped down to simple essentials. Some might prefer content
>> which has extra explanations. Some might benefit from a media query
>> based on reading level. Compare the discussion of assessing
>> readability levels at:
>> 
>> http://juicystudio.com/services/readability.php
>> 
>> reading-level: <integer> | basic | average | complex | any
>> (default: any)
>> 
>> Where the integer would be how many years of schooling it would
>> take an average person to understand the content: basic could be
>> (say) 9, average could be 12, and complex could be 17
>> (post-graduate).
>> 
>> This wouldn't be easily testable, but it might be useful
>> nevertheless.
> 
> Yes, this isn't testable, and is quantitative.

For verbal content, it /is/ testable (otherwise WCAG 2.0 would not 
include it). The Juicy Studio page I referenced included tools to test 
it with various formulae. Many of these formulae are good enough for 
governmental use. For example, the US government tends to test documents 
for readability with Flesch-Kincaid:

http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test

The uneasiness comes from the fact that WCAG 2.0 does not stipulate 
precisely which formula(e) to use for testing. See:

http://www.w3.org/TR/UNDERSTANDING-WCAG20/Overview.html#meaning-supplements

I assume that's because:

1) The WCAG guideline needs to continue to be relevant even as formulae 
improve.

2) Different languages need different formulae, so if they were to 
mandate a solution they'd have to list a different solution for each 
language. Here's one for Japanese, for example:

http://www.utexas.edu/research/accessibility/resource/readability/manual/formulas-English.html#jap

My guess is that /any/ of these readability indices would be good enough 
for our purposes in practice. We don't need the same reproducibility 
here as we do when it comes to not triggering epileptic fits!

--
Benjamin Hawkes-Lewis