[whatwg] Video feedback

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Fri Jun 3 18:39:58 PDT 2011

I'll be replying to WebVTT related stuff in a separate thread. Here
just feedback on the other stuff.

(Incidentally: why is there <details> element feedback in here with
video? I don't really understand the connection.)

On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson <ian at hixie.ch> wrote:
> On Thu, 16 Dec 2010, Silvia Pfeiffer wrote:
>> I do not know how technically the change of stream composition works in
>> MPEG, but in Ogg we have to end a current stream and start a new one to
>> switch compositions. This has been called "sequential multiplexing" or
>> "chaining". In this case, stream setup information is repeated, which
>> would probably lead to creating a new steam handler and possibly a new
>> firing of "loadedmetadata". I am not sure how chaining is implemented in
>> browsers.
> Per spec, chaining isn't currently supported. The closest thing I can find
> in the spec to this situation is handling a non-fatal error, which causes
> the unexpected content to be ignored.
> On Fri, 17 Dec 2010, Eric Winkelman wrote:
>> The short answer for changing stream composition is that there is a
>> Program Map Table (PMT) that is repeated every 100 milliseconds and
>> describes the content of the stream.  Depending on the programming, the
>> stream's composition could change entering/exiting every advertisement.
> If this is something that browser vendors want to support, I can specify
> how to handle it. Anyone?

Icecast streams have chained files, so streaming Ogg to an audio
element would hit this problem. There is a bug in FF for this:
https://bugzilla.mozilla.org/show_bug.cgi?id=455165 (and a duplicate
bug at https://bugzilla.mozilla.org/show_bug.cgi?id=611519). There's
also a webkit bug for icecast streaming, which is probably related
https://bugs.webkit.org/show_bug.cgi?id=42750 . I'm not sure how Opera
is able to deal with icecast streams, but it seems to deal with it.

The thing is: you can implement playback and seeking without any
further changes to the spec. But then the browser-internal metadata
states will change depending on the chunk you're on. Should that also
update the exposed metadata in the API then? Probably yes, because
otherwise the JS developer may deal with contradictory information.
Maybe we need a "metadatachange" event for this?

> On Tue, 24 May 2011, Silvia Pfeiffer wrote:
>> Ian and I had a brief conversation recently where I mentioned a problem
>> with extended text descriptions with screen readers (and worse still
>> with braille devices) and the suggestion was that the "paused for user
>> interaction" state of a media element may be the solution. I would like
>> to pick this up and discuss in detail how that would work to confirm my
>> sketchy understanding.
>> *The use case:*
>> In the specification for media elements we have a <track> kind of
>> "descriptions", which are:
>> "Textual descriptions of the video component of the media resource,
>> intended for audio synthesis when the visual component is unavailable
>> (e.g. because the user is interacting with the application without a
>> screen while driving, or because the user is blind). Synthesized as a
>> separate audio track."
>> I'm for now assuming that the synthesis will be done through a screen
>> reader and not through the browser itself, thus making the
>> descriptions available to users as synthesized audio or as braille if
>> the screen reader is set up for a braille device.
>> The textual descriptions are provided as chunks of text with a start
>> and a end time (so-called "cues"). The cues are processed during video
>> playback as the video's playback time starts to fall within the time
>> frame of the cue. Thus, it is expected the that cues are consumed
>> during the cue's time frame and are not present any more when the end
>> time of the cue is reached, so they don't conflict with the video's
>> normal audio.
>> However, on many occasions, it is not possible to consume the cue text
>> in the given time frame. In particular not in the following
>> situations:
>> 1. The screen reader takes longer to read out the cue text than the
>> cue's time frame provides for. This is particularly the case with long
>> cue text, but also when the screen reader's reading rate is slower
>> than what the author of the cue text expected.
>> 2. The braille device is used for reading. Since reading braille is
>> much slower than listening to read-out text, the cue time frame will
>> invariably be too short.
>> 3. The user seeked right into the middle of a cue and thus the time
>> frame that is available for reading out the cue text is shorter than
>> the cue author calculated with.
>> Correct me if I'm wrong, but it seems that what we need is a way for
>> the screen reader to pause the video element from continuing to play
>> while the screen reader is still busy delivering the cue text. (In
>> a11y talk: what is required is a means to deal with "extended
>> descriptions", which extend the timeline of the video.) Once it's
>> finished presenting, it can resume the video element's playback.
> Is it a requirement that the user be able to use the regular video pause,
> play, rewind, etc, controls to seek inside the extended descriptions

No, the audio descriptions (which are only text to the browser and
turn into audio only through the screen reader) are controlled by the
screenreader, not by the video controls. When the user navigates using
the video controls, the cues of the audio description change and will
be handed to the screenreader, too, so can be read out in sync. But
the video controls have no direct control over the read-out audio.

> , or
> should they literally pause the video while playing, with the audio
> descriptions being controlled by the same UI as the screen reader?

The audio descriptions cannot control the video, since they are just
text cues with a start and end time that is supposed to be in sync
with the video. The only component that actually knows whether the
user has heard the full text of a text cue is the screen reader, since
it is turning the text into sound. So, the control over pausing the
video must come from there. Indeed, the user should be able to control
this through the screen reader UI - e.g. hit a button to skip reading
a cue and let the video continue playing uninterrupted.

>> IIUC, a video is "paused for user interaction" basically when the UA has
>> decided to pause the video without the user asking to pause it (i.e. the
>> paused attribute is false) and the pausing happened not for network
>> buffering reasons, but for other reasons. IIUC one concrete situation
>> where this state is used is when the UA has reached the end of the
>> resource and is waiting for more data to come (e.g. on a live stream).
> That latter state is not "paused for user interaction", it's just stalled
> due to lack of data. The rest is accurate though.

Do you have an example, then, for when a video actually goes into the
state "paused for user interaction"? Is it for ads? I just wonder
because if you do ads through JavaScript, you will probably call the
pause() function and then the state is paused and not "paused for user

Anyway, it does seem to be the right state for the screen reader
interaction since it's an internal state and not a user controlled

>> To use "paused for user interaction" for extending descriptions, we need
>> to introduce a means for the screen reader to tell the UA to pause the
>> video when it reaches the end of the cue and it's still busy delivering
>> a cue's text. Then, as it finishes, it will un-pause the video to let it
>> continue playing.
>> To me it sounds like a feasible solution.
>> The screen reader could even provide a user setting and a short-cut so a
>> user can decide that they don't want this pausing to happen or that they
>> want to move on from the current cue.
>> Another advantage of this approach is that e.g. a deaf-blind user could
>> hook up their braille device such that it will deliver the extended
>> descriptions and also deliver captions through braille with such
>> extension pausing happening. (Not sure that such a user would even want
>> to play the video, but it would be possible.)
>> Now, I think there is one problem though (at least as far as I can
>> tell). Right now, IIUC, screen readers are only passive listeners on the
>> UA. They don't influence the behaviour of the UA. The accessibility API
>> is basically only a one-way street from the UA to the AT. I wonder if
>> that is a major inhibitor of using this approach or whether it's easy
>> for UAs to overcome this limitation? (Or if such a limitation even
>> exists - I don't know enough about how AT work...).
>> Is that an issue? Are there other issues that I have overlooked?
> That seems to be entirely an implementation issue.

Excellent, so I guess we agree that this is the way in which it should
be implemented?


More information about the whatwg mailing list