[whatwg] Comments about the track element

Cyril Concolato cyril.concolato at telecom-paristech.fr
Thu Jul 26 05:03:14 PDT 2012

Hi Silvia,

Thank you for your reply. Your comments are quite helpful to understand 
how WebVTT can be used or not. See my comments below.

Le 7/26/2012 12:44 AM, Silvia Pfeiffer a écrit :
> On Wed, Jul 25, 2012 at 11:45 PM, Cyril Concolato
> <cyril.concolato at telecom-paristech.fr> wrote:
>>> Right now it is fully defined how data in a TextTrack (of the defined
>>> kinds) is displayed on top of the video. As this is as yet unclear for
>>> SVG resources,
>> I wouldn't say it's unclear, I'd say it needs to be specified ;) meaning
>> that it probably doesn't require much specification. I was thinking that we
>> could use the CSS box of the video element to position the SVG, as if the
>> SVG was put in a div.
> Let's work on this basis and see where we get. There's also
> positioning issues
What do you mean here by "positioning issues"? SVG handles the 
positioning within its viewbox and what I propose is to define the size 
and position of this viewbox in the parent coordinate system, i.e. with 
respect to the video. I don't see what else is needed? or do you mean 
when SVG is transported in cue, how do you use the cue settings?

> etc. so it's not as simple as just putting the SVG
> in a cue.
>>> I would suggest using the @metadata tack kind for now
>>> and providing the SVG as markup in a TextTrackCue (either from WebVTT
>>> cues
>> I've tried this option but I'm facing several problems (Tested with Chrome
>> Version 22.0.1216.0 canary).
>> The first problem is how to embed SVG in a cue? Should the '<', '>' and
>> other characters be escaped or not? According to Anne's validator,
> So, I assume you created WebVTT files. (You don't have to - you can
> directly use the TextTrack API.)
> Anne's validator validates the WebVTT rules for caption and subtitle
> kinds. For "metadata" kinds, there should be no parsing of the cues in
> browsers.
Reading the spec again, I see that the parsing rules for "WebVTT 
metadata text" are different indeed. My mistake.

> A validator can only decide whether to parse the cues
> according to "captions"/"subtitles", or "chapters", or "metadata"
> rules if the WebVTT file has such an indicator. I've asked for such
> information to be included in WebVTT, but we don't currently have such
> markup/metadata.
Do you mean that you would like to have some signaling in the WebVTT 
file (for instance in the header) to indicate the type of the cue 
payload? I think that'll be interesting. Otherwise, it'll be interesting 
to have a type selector in the validator.

>> they
>> should be.
> Actually, for @kind=metadata you don't need to escape '<' or '>'.
Yes, I had missed that.

>> But if I use them, then the parsing of the escaped string returns
>> 'empty document'
>> (http://perso.telecom-paristech.fr/~concolat/html5_tests/getcueasSVG-escaped.html).
> Which parsing? Anne's validator? Have you tried Chrome directly?
Here I meant using a DOMParser object in JavaScript using Chrome.

> http://perso.telecom-paristech.fr/~concolat/html5_tests/svg-escaped.vtt
> does look very ugly.
Indeed, but as you said, for the @kind=metadata it is not needed.

>> However, if I don't escape them, the parsing doesn't fail and returns an SVG
>> document
>> (http://perso.telecom-paristech.fr/~concolat/html5_tests/getcueasSVG.html).
> cue.text is the SVG code? That's what we want, right?
> (http://perso.telecom-paristech.fr/~concolat/html5_tests/svg.vtt looks
> much nicer)
Yes, cue.text seems to be the best option when using WebVTT.
>> In any case, I think embedding the SVG in WEBVTT does not really make sense.
> Why not?
I should have said forcing the embedding of SVG in WEBVTT does not make 
sense, since there is some overlap between the 2 (timing, positioning 
...), some overhead, and limitations (see below).
>> An other problem is in terms of design. SVG has a timing model (similar to
>> TTML), WebVTT another. For instance, SVG can express things like repetitions
>> of animations that WebVTT cannot. Are you saying that TTML should be carried
>> in a WebVTT file?
> TTML in WebVTT probably doesn't make sense. But SVG's timing model can
> be a applied within the timeframe of a cue, so that does make sense.
Maybe, yes. It might make sense if your cue has a long duration, 
otherwise the overhead of loading an SVG document for each cue might be 
too big. But in general, since you can structure an SVG document with a 
frame-based structure (see this cartoon for instance: 
<http://perso.telecom-paristech.fr/%7Econcolat/SVG/flash10.svg>), I 
don't see the added value of WebVTT to carry SVG.
> How would you specify this with TTML? It would run into the same
> problems, wouldn't it?
I think so, the problems would be similar. But again, TTML can also 
express frame-based animations, why should you add the WebVTT layer?
>> Similarly, in terms of design, embedding SVG in cues requires repeating a
>> lot of SVG content at each cue (see
>> http://perso.telecom-paristech.fr/~concolat/html5_tests/svg.vtt), as this
>> approach requires parsing an entire document at each cue. You could probably
>> envisage overlapping cues but that would require a lot of overhead.
>> Leveraging the progressive loading of SVG cannot be done this way either.
>> In general, I think it would make sense to leverage the browsers' support
>> for SVG and not stack different technologies.
> Sure, it should use existing SVG support. I'm not so sure I agree with
> not stacking - that depends.
The only possible added value (I think) for carrying SVG in WebVTT would 
be to use the cue settings to position the SVG with respect to the 
video, like the line positioning. But I'm not sure, and that probably 
should be part of CSS and be applicable without WebVTT. Do you have 
other examples of the added value of carrying SVG in WebVTT?
> What would your preferred markup for
> http://perso.telecom-paristech.fr/~concolat/html5_tests/svg.vtt be ?
> How would you avoid the duplication?
For instance, you would want to be able to construct the SVG document 
progressively, to have only one document that you modify by adding more 
data. One way to do it would be to have the first cue contain the 
beginning of the document and the following cues contain more data, but 
since modifying the document after its load is tricky, this would 
require concatenating all previous cue texts and then parsing that as a 
new document (ugly!). I'd like to have the parsing step done under the 
hood by the browser, as it usually do.
>> Another problem is that I don't know if it's possible to display the SVG
>> content in a layer between the video and the UI controls. Currently, I
>> display the SVG on top of the video element, therefore the UI controls are
>> not accessible for clicks. Having to embed my own UI controls for that is a
>> bit of a pain. And, semantically, when reading the spec, 'metadata' tracks
>> say " Not displayed by the user agent. " so I think this might be a bit
>> confusing for users/authors.
> All publishers that want the same controls in all browsers make their
> own controls anyway. If you make a library for SVG display on top of a
> video, you can also make one for the controls (or use one of the many
> existing ones).
That's an option, but that shouldn't be the only one.
>> The third problem is performance-wise. In my example, the blue line (in
>> SVG), when synchronized with the video, should be aligned with the moving
>> (white-gray) edge of the pie. As you can see, this is not the case. Only 4-5
>> cuechange events seems to be processed properly. I noticed the same problem
>> with 'timeupdate' events. Also, I've noticed that even though my WebVTT file
>> is designed to have only one active cue at a time, for some cuechange
>> events, there are 2. This might be an implementation issue but this might be
>> a problem of reentrant code (the cuechange callback being called while it's
>> not finished), but in general, I'm not sure it's a good idea to go through
>> the Javascript engine to do that, for the processing overhead.
> TextTrack support is still very new. I agree that its update frequency
> should be more often than the timeupdate events. Your example is
> indeed pushing the boundaries. Basically you are asking it to draw a
> clock handle in synch with a video that is updating its clock pie
> every video frame. TextTrack was built for relatively "rare" events
> along the timeline of a video - certainly not for something that needs
> an update with every video frame. Going through WebVTT makes this
> particularly slow.
If you try my example here 
(http://perso.telecom-paristech.fr/~concolat/html5_tests/getcueasSVG.html <http://perso.telecom-paristech.fr/%7Econcolat/html5_tests/getcueasSVG.html>), 
you'll see that changing the playback speed (even to 0.1) does not 
guarantee synchronization either. By the time the JS has processed the 
content, it's already too late. It might be an implementation issue but 
it's symptomatic of the stacking, that's why I think we should leverage 
the native parsing, synchronization and support for SVG rendering (not 
through JS). The clock might be a (not so) extreme case, but I don't 
think I'm trying to do very fancy things here, just trying to reproduce 
existing technologies (proprietary or not) with existing web standards.
>>> or from JavaScript calls to addTextTrack()).
>> Can you elaborate on this one? However, I suspect it'll have the same
>> processing overhead.
> I'm not sure. Having to repeatedly parse WebVTT cues and draw the SVG
> image makes this particularly slow. Have you tried to paint the SVG
> just once on the video and using TextTrackCues just to change the
> transform value using JavaScript? Upon a cuechange event, you re-draw
> the SVG.
I could give it a try if I have some time but I'm not really sure I 
understand what you're suggesting. Do you mean using addCue? Could you 
give an example? Are you suggesting something similar to the example in 
the spec with

var sounds = sfx.addTextTrack('metadata');

>>>> for
>>>> instance reusing the viewport/viewbox negotiation phase. There would also be
>>>> a need to make a more generic Track API or to replace the TextTrack API by
>>>> the SVG API when the track is of kind 'graphics'.
>>> I don't understand this requirement. What API needs are there aside
>>> from the synchronization? Trying to replicate SVG APIs through the
>>> TextTrack API seems like a repetition of the API and thus fragile.
>> Sorry for the confusion here. I didn't mean to replicate the SVG APIs here
>> but I just meant that the TextTrack API is very specific to 'pure' text
>> tracks (and even to WebVTT text tracks). You might want to expose the SVG
>> API when SVG content is used for the overlay to control it.
> Can you make an example? How do you think that should look?
I was thinking of having something like the following. Pardon my IDL 
mistakes. Also note that it is not really a proposal, I haven't thought 
thoroughly of all the consequences, but it is just to give an idea.

enumTextTrackMode  { "disabled  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-disabled>","hidden  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-hidden>","showing  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-showing>" };
interfaceTrack  :EventTarget  <http://www.whatwg.org/specs/web-apps/current-work/#eventtarget>  {
   readonly attribute DOMStringkind  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-kind>;
   readonly attribute DOMStringlabel  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-label>;
   readonly attribute DOMStringlanguage  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-language>;
   readonly attribute DOMStringinBandMetadataTrackDispatchType  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-inbandmetadatatrackdispatchtype>;

            attributeTextTrackMode  <http://www.whatwg.org/specs/web-apps/current-work/#texttrackmode>  mode  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-mode>;


interfaceTextTrack  : Track  <http://www.whatwg.org/specs/web-apps/current-work/#eventtarget>  {
   readonly attributeTextTrackCueList  <http://www.whatwg.org/specs/web-apps/current-work/#texttrackcuelist>?cues  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-cues>;
   readonly attributeTextTrackCueList  <http://www.whatwg.org/specs/web-apps/current-work/#texttrackcuelist>?activeCues  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-activecues>;

   voidaddCue  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-addcue>(TextTrackCue  <http://www.whatwg.org/specs/web-apps/current-work/#texttrackcue>  cue);
   voidremoveCue  <http://www.whatwg.org/specs/web-apps/current-work/#dom-texttrack-removecue>(TextTrackCue  <http://www.whatwg.org/specs/web-apps/current-work/#texttrackcue>  cue);

            attributeEventHandler  <http://www.whatwg.org/specs/web-apps/current-work/#eventhandler>  oncuechange  <http://www.whatwg.org/specs/web-apps/current-work/#handler-texttrack-oncuechange>;


interfaceGraphicsDocumentTrack  : Track {
            attribute Document trackDocument;

The basic Track interface would be almost the same as the VideoTrack or 
AudioTrack. The GraphicsDocumentTrack interface would be used for tracks 
which have an underlying document (TTML, SVG, SMIL?, HTML?...) with a 
visual representation and not necessarily based on cues. For these 
documents, you are not interested in cues or cue changes (and it might 
not make sense to define cues). For these, you're only interested in:
- the dispatch of the track content to the parser being done 
automatically by the browser (no need to use a JS DOMParser);
- the rendering of the underlying document being synchronized (natively) 
by the browser, i.e. the timeline of the underlying document should be 
locked with the timeline of the video (or audio). No need to monitor cue 
changes to render the right SVG.
You could discriminate between a TextTrack and a GraphicsDocumentTrack 
by a mime type or the inBandMetadataTrackDispatchType (not sure...). 
When the track carries SVG, the trackDocument object could be an 
SVGDocument. This would allow for controlling the SVG as if it was 
embedded in the HTML but for the synchronization done by the browser. 
What do you think?

Hoping I'm clear,

Cyril Concolato
Maître de Conférences/Associate Professor
Groupe Multimedia/Multimedia Group
Telecom ParisTech
46 rue Barrault
75 013 Paris, France

More information about the whatwg mailing list