[whatwg] Reconsidering how we deal with text track cues

Wed Jun 12 10:08:58 PDT 2013

On Wed, 12 Jun 2013, Silvia Pfeiffer wrote:
>
> As we continue to evolve the functionality of text tracks, we will 
> introduce more complex other structured content into cues and we will 
> want browsers to parse and interpret them.

I think it's a mistake to try to solve problems before they exist. We 
don't know exactly what we'll be adding in the future, so we don't know 
what we'll need yet.

> For example, I expect that once we have support for speech synthesis in 
> browsers [1], cues of kind descriptions will be voiced by speech 
> synthesis, and eventually we want to influence that speech synthesis 
> with markup (possibly a subpart of SSML [2] or some other simpler markup 
> that influences prosody).

I think it's highly unlikely that we'll actually ever want that, but if we 
ever do, then we should fix the problem then.

> All of these new cue settings would end up as new attributes on the 
> WebVTTCue object. This is a dangerous design path that we have taken.

This is wrong on two points. One, there's nothing forcing a text track 
format to only generate one kind of object -- just like HTML generates 
different objects for different elements, WebVTT could generate different 
objects for different cues. Two, it's not dangerous to have an object with 
lots of fields.

> What we have done with WebVTT is actually two-fold:
> 1. we have created a file format that serializes arbitrary content
> that is time-synchronized with a media element.
> 2. and we have created a simple caption/subtitle cue format.
> 
> That both are called "WebVTT" is the cause of a lot of confusion and not 
> a good design approach.

I think it's a mistake to view these as distinct. It's just one format. 
But as you're that spec's editor, that's your choice. :-)

> Firstly, there are consequences on the WebVTT spec.
> 
> I suggest we rename WebVTTCue [1] to VTTCaptionCue and allow such cues
> only on tracks of kind={caption, subtitle}.

I don't think that makes any sense. Any WebVTT file can be used for any 
"kind" of <track>. These are orthogonal contexts.

It would be like having a different DOM for an HTML file in an <iframe> 
and in a top-level browsing context. You don't necessarily know, when 
parsing the WebVTT file or HTML file, what it's going to be used for. In 
the case of WebVTT, it could even change from one to another.

> Also, we separate out the WebVTT serialisation format syntax 
> specification from the cue syntax specification [2] and introduce 
> separate parsers [3] for the different cue syntax formats. The rendering 
> section [4] has already started distinguishing between cue rendering for 
> chapters and for captions/subtitles. This will easily fit with the now 
> separated cue syntax formats.

This sounds like a lot of complication for no particularly good reason, 
but again, you're the editor. :-)

> Secondly, there are consequences for the TextTrackCue object hierarchy 
> in the HTML spec.
> 
> I suggest we rename TextTrackCue to AbstractCue (or just Cue). It is 
> simply the abstract result of parsing a serialisation of cues (e.g. a 
> WebVTT file) into its individual cues.
>
> Similarly TextTrackCueList should be renamed to CueList and should be a 
> cue list of only one particular type of cue. Thus, the parsing and 
> rendering algorithm in use for all cues in a CueList is fixed. Also, a 
> CueList of e.g. ChapterCues should only be allowed to be attached to a 
> track of kind=chapters, etc.

I don't understand the value in changing these names. This seems quite 
orthongonal to the rest of this e-mail.

In general, I am strongly against changing names unless there's a 
seriously compelling reason, like compatibility requirements. Churn in a 
specification is extremely negative, as it leads implementors to lose 
respect in the spec, and makes them think there's no point in following 
specs in the first place.

This is one of the core requirements of a Living Standard: that things 
*not change arbitrarily*. We can't just change our minds on things every 
few weeks. We have to pick a direction and then stick with it. Basically, 
we have to have confidence in our decisions. This doesn't mean we can't 
change things, but it means that to change things we should have a 
compelling reason. I don't see one for this proposed change.

> Doing this will make WebVTT and the TextTrack API extensible for new cue 
> formats, such as cues in SSML format, or ThumbnailCues, or MidrollAdCues 
> or whatnot else we may see necessary in the future.

It's already plenty extensible enough.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'