[whatwg] Google Feedback on the HTML5 media a11y specifications

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Tue Feb 15 03:21:46 PST 2011


Philip,

As promised here is the summarized list of things that after this
discussion I still think we should add/change:

* the file magic string should not be “WEBVTT FILE”, but “WEBVTT” only
(or alternatively "WebVTT", but typically magic identifiers are all
caps

* allow for name-value pairs as file-wide metadata underneath the file
magic string and specify the format for providing name-value pairs -
only an empty line determines the end of the header section

* allow the use of shorter time specifiers, in particular:
 - "[[h*:]mm:]ss[.[d[c[m]]]  | s*[.d[c[m]]]" as the start and end time
 - "-" as the separator between start and end time instead of “-->”
 - "+s*[.d[c[m]]]" as a possible end time specifier, or a relative
mid-cue timestamp; the relative mid-cure timestamp works in
aggregation

* allow commenting out whole lines after a “//” (or a "#") at the line start

* use more verbose cue settings: direction (instead of D),
linePosition (instead of L),  textPosition (instead of T), size
(instead of S), align (instead of A)

* introduce default cue settings in the header part of the file,
possibly as a name-value pair, or some alternative dedicated form

* allow the use of the <u> element for underlined sections (assuming I
can find some examples for this)

All the rest of the issues seems to be covered through the <c> element
and classes.


Cheers,
Silvia.



On Tue, Feb 15, 2011 at 10:20 PM, Silvia Pfeiffer
<silviapfeiffer1 at gmail.com> wrote:
> On Tue, Feb 15, 2011 at 9:09 PM, Philip Jägenstedt <philipj at opera.com> wrote:
>> On Tue, 15 Feb 2011 04:28:36 +0100, Silvia Pfeiffer
>> <silviapfeiffer1 at gmail.com> wrote:
>>
>>> Hi Philip,
>>>
>>> On Tue, Feb 15, 2011 at 3:27 AM, Philip Jägenstedt <philipj at opera.com>
>>> wrote:
>>>>
>>>> On Wed, 09 Feb 2011 03:57:37 +0100, Silvia Pfeiffer
>>>> <silviapfeiffer1 at gmail.com> wrote:
>>>>
>>>>>>> A. Feedback on the WebVTT format
>>>>>>
>>>>>>> 1. Introduce file-wide metadata
>>>>>>>
>>>>>>> WebVTT requires a structure to add header-style metadata. We are here
>>>>>>> talking about lists of name-value pairs as typically in use for header
>>>>>>> information. The metadata can be optional, but we need a defined means
>>>>>>> of adding them.
>>>>>>>
>>>>>>> Required attributes in WebVTT files should be the main language in use
>>>>>>> and the kind of data found in the WebVTT file - information that is
>>>>>>> currently provided in the <track> element by the @srclang and @kind
>>>>>>> attributes. These are necessary to allow the files to be interpreted
>>>>>>> correctly by non-browser applications, for transcoding or to determine
>>>>>>> if a file was created as a caption file or something else, in
>>>>>>> particular the @kind=metadata. @srclang also sets the base
>>>>>>> directionality for BiDi calculations.
>>>>>>
>>>>>> Are there non-browsers that use the language for font-selection or
>>>>>> bidi?
>>>>>> Is
>>>>>> auto-detection not likely to give a better user experience? Are there
>>>>>> any
>>>>>> other use cases for knowing the language of the captions *after*
>>>>>> they've
>>>>>> been opened?
>>>>>
>>>>>
>>>>> I can't see a different way to let non-browser applications know what
>>>>> font to choose, even how to provide the user with a menu of available
>>>>> caption tracks for a video, or to set the base directionality for
>>>>> BiDi. Also, language auto-detection is a huge burden to put onto
>>>>> non-browser applications. Having a readable language tag at the
>>>>> beginning of the file is useful to quickly figure it all out.
>>>>>
>>>>> The language set in <track> would certainly overrule what is in the
>>>>> file. Also, the last language attribute in the header would probably
>>>>> win.
>>>>>
>>>>> I guess it would also be ok to have language and kind optional -
>>>>> different applications may then default to interpreting WebVTT files
>>>>> differently, such as by default English and Captions - or English and
>>>>> Descriptions, but that's probably acceptable from context.
>>>>
>>>> Given that most existing subtitle formats don't have any language
>>>> metadata,
>>>> I'm a bit skeptical. However, if implementors of non-browser players want
>>>> to
>>>> implement WebVTT and ask for this I won't stand in the way (not that I
>>>> could
>>>> if I wanted to). For simplicity, I'd prefer the language metadata from
>>>> the
>>>> file to not have any effect on browsers though, even if no language is
>>>> given
>>>> on <track>.
>>>
>>> There is also the Content-Language response header of HTTP, which
>>> could have an influence on the browser, too. I'm not sure about the
>>> best way to deal with all this overlapping information, but I'm sure
>>> it can be sorted out.
>>
>> My preference is ignoring everything except what is given in <track>. In
>> particular language can't be given in the resource or its headers, because
>> then one has to fetch all the tracks in order to provide a track selection
>> menu with language information or to automatically activate the suitable
>> tracks.
>
> Ah yes, that makes sense. I'd have to agree.
>
>
>
>>>>>> Why do non-browser players need to know the kind? All kinds are
>>>>>> processed
>>>>>> in
>>>>>> the same way except metadata, and there's no reason to use metadata
>>>>>> tracks
>>>>>> with external players.
>>>>>
>>>>> Maybe I have a different view of what applications will make use of
>>>>> WebVTT files than most. My thinking is that there will also be uses
>>>>> for metadata tracks in external applications. Aside from this, there
>>>>> will be authoring applications and players, yes, but there will also
>>>>> be automated processing tools. So, to know what type of content is
>>>>> inside a file without having to look at more than the file's headers
>>>>> is really important.
>>>>
>>>> For both of these cases, putting some magic strings inside comments that
>>>> are
>>>> ignored by browsers sounds like it would be sufficient. Name-value
>>>> metadata
>>>> that is ignored by browsers would be fine as well.
>>>
>>> I'm for the second option: name-value metadata that is ignored by the
>>> browser. I think in fact the browser should in general ignore all
>>> name-value metadata with the exception of file-wide cue settings.
>>
>> I agree, browsers should ignore in-file metadata. (That's one reason I think
>> using comments for it is quite fine most of the time.)
>
> Maybe then we should find a different way to set the default settings
> for the cues and not in a CueSettings=... metadata field. It seemed
> elegant, but I am not so sure any more.
>
>
>
>>>>>>> Further metadata fields that are typically used by authors to keep
>>>>>>> specific authoring information or usage hints are necessary, too. As
>>>>>>> examples of current use see the format of MPlayer mpsub’s header
>>>>>>> metadata [2], EBU STL’s General Subtitle Information block [3], and
>>>>>>> even CEA-608’s Extended Data Service with its StartDate, Station,
>>>>>>> Program, Category and TVRating information [4]. Rather than specifying
>>>>>>> a specific subset of potential fields we recommend to just have the
>>>>>>> means to provide name-value pairs and leave it to the negotiation
>>>>>>> between the author and the publisher which fields they expect of each
>>>>>>> other.
>>>>>>
>>>>>> This approach has worked very well with Vorbis Comments, probably
>>>>>> mostly
>>>>>> because all interesting fields have been pre-defined in
>>>>>> http://www.xiph.org/vorbis/doc/v-comment.html
>>>>>>
>>>>>> For a web format though, wouldn't some kind of wiki registry be good to
>>>>>> avoid total mayhem, especially if there are some predefined fields?
>>>>>> (Not
>>>>>> having file-wide metadata would also avoid such mayhem.)
>>>>>
>>>>> It might be good to define a base set - the Vorbis Comments one or the
>>>>> ID3 ones could be appropriate. Even the old Dublin Core set (the first
>>>>> ones, not the current chaos) could be good. I could also analyse the
>>>>> sets used in current typical caption formats and propose a superset of
>>>>> those.
>>>>>
>>>>> While I think you're right with suggesting a predefined set of fields,
>>>>> I am mostly keen right now to agree on the general format of the
>>>>> fields and how we need to parse them rather than what they actually
>>>>> are.
>>>>>
>>>>> So, I would suggest we allow lines of "name=value" under the WEBVTT
>>>>> magic string. A blank line defines the end of the header section and
>>>>> the beginning of the cues. Would be simple enough to parse, right?
>>>>
>>>> Sure, it's already handled by the current parsing spec, since it ignores
>>>> everything up to the first blank line.
>>>
>>> That's not quite how I'm reading the spec.
>>>
>>>
>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#webvtt-0
>>> allows
>>> "Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER
>>> TABULATION (tab) character followed by any number of characters that
>>> are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)
>>> characters."
>>> after the "WEBVTT FILE" magic.
>>> To me that reads like all of the extra stuff has to be on the same line.
>>> I'd prefer if this read "any character except for two WebVTT line
>>> terminators", then it would all be ready for such header-style
>>> metadata.
>>
>> See steps 12-17 of
>> <http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#parsing-0>,
>> it just skips all lines up to the first blank line. Syntax and parsing are
>> different :)
>
> So it's not in the syntax spec, but acceptable input, hmmm. I think we
> should add it explicitly to the spec and define the general way in
> which metadata is supposed to be given, such as in the form
> <name>=<value>. We don't have to parse it, but it should be in the
> syntax specification.
>
>
>
>>>>>>> 4. Cue formatting requirements
>>>>>>>
>>>>>>> In analysing the available cue formatting functionality, we have found
>>>>>>> that some features are missing. Most of these features can be added
>>>>>>> through using CSS on cues that have received a <b>, <i>, <c> or <v>
>>>>>>> marker. The following features are core to traditional TV and exist in
>>>>>>> EBU STL and CEA-608/708 captions. Support of these will be a core
>>>>>>> requirement for browsers as well as non-browser applications and it
>>>>>>> makes sense to add these to WebVTT rather than relying on external CSS
>>>>>>> which cannot be used for non-browser captions:
>>>>>>
>>>>>> The unstated requirement here seems to be that WebVTT needs to work as
>>>>>> an
>>>>>> interchange format for various TV captioning formats even in user
>>>>>> agents
>>>>>> without any support for CSS (or JavaScript). I'm trying to not make a
>>>>>> straw
>>>>>> man argument, but if want an interchange format, we should pick TTML,
>>>>>> which
>>>>>> is explicitly designed to be just that and doesn't depend on CSS.
>>>>>>
>>>>>> Is it not enough that a lossy conversion can be made from various
>>>>>> formats
>>>>>> into WebVTT+CSS(+JavaScript)? If not, the "Web" in "WebVTT" is highly
>>>>>> misleading...
>>>>>
>>>>>
>>>>> We're trying to avoid the need for multiple transcodings and are
>>>>> trying to achieve something like the following pipeline:
>>>>> broadcast captions -> transcode to WebVTT -> show in browser ->
>>>>> transcode to broadcast devices -> show
>>>>>
>>>>> If we have to plug TTML into this pipeline, too, it will be much
>>>>> slower and we would need to additionally define a mapping from TTML to
>>>>> WebVTT and back.
>>>>>
>>>>> I'm sure with SMPTE-TT around we will end up seeing things like
>>>>> broadcast->TTML->WebVTT->browser, but even then we don't want WebVTT
>>>>> to be a lossy format.
>>>>
>>>> I can only disagree. Trying to make WebVTT into an interchange format
>>>> will
>>>> inevitably turn it into a highly presentational format with lots of
>>>> legacy
>>>> baggage. I can certainly see the use cases for an interchange format, but
>>>> I
>>>> don't think it's worth the added complexity. I'd prefer an approach where
>>>> any format quirks that can't be mapped to WebVTT are expressed using
>>>> <c.foo>
>>>> and if it turns out lots of people want the feature, we can add it to a
>>>> future revision.
>>>
>>> I wouldn't go as far as to say it needs to become an interchange
>>> format. But I can see us specifying what the browser parses, while
>>> given options such as the header-metadata and span classes that allow
>>> with some extra information to fully recover the broadcast
>>> functionality. I actually think that is almost possible already.
>>
>> After this thread has run for a while, it'd be nice to hear where you think
>> <c.foo> isn't enough and new markup is needed, if anything.
>
> I'll give a summary in a separate email so it's easier to see.
>
>
>
>>>>>>> * underline: EBU STL, CEA-608 and CEA-708 support underlining of
>>>>>>> characters. The underline character is also particularly important for
>>>>>>> some Asian languages. Please make it possible to provide text
>>>>>>> underlines without the use of CSS in WebVTT.
>>>>>>
>>>>>> Which Asian languages? If it's just the Chinese
>>>>>> <http://en.wikipedia.org/wiki/Proper_name_mark>, then I don't think
>>>>>> that
>>>>>> needs <u> or similar. In my experience, use of the Chinese proper name
>>>>>> mark
>>>>>> is in fact extremely rare in Chinese captions, at least in movies and
>>>>>> TV
>>>>>> series from the mainland and Taiwan. It would be best to use e.g.
>>>>>> 我來自<c.pnm>中國</c> to make it easy to change the style between
>>>>>> single/double/wavy/no underline.
>>>>>
>>>>> OK. So if we need underlined text, it will need to be
>>>>> <c.underline>..</c> and CSS underline? I guess in a Web context
>>>>> underline text is usually a hyperlink so it makes sense to discourage
>>>>> <u> for the Web. But is that also an argument for
>>>>> captions/subtitles/descriptions? What is the argument against using
>>>>> <u> in captions?
>>>>
>>>> I don't really have an argument against it, I just questioned that it is
>>>> important for Asian languages in particular. Adding <u> would be really
>>>> simple, it's just a question of why. I've seldom seen underlining in
>>>> captions, so it's not clear to me how it's usually used.
>>>
>>> I'm told <u> is fairly common in traditional captions. We don't do
>>> <c.italics> either for such common stuff.
>>> But if we really don't want this, I guess <c.u> would work, too and is
>>> not that much longer.
>>
>> I can't see any underlining when scanning through the samples at
>> <http://wiki.whatwg.org/wiki/Use_cases_for_timed_tracks_rendered_over_video_by_the_UA>.
>> If it is in fact common in some contexts, it'd be great to have samples
>> added to the wiki, I'm sure we could learn something from it. If <u> is
>> actually useful for something, then we should just add it.
>
> I've asked for examples - I personally don't have any either, unfortunately.
>
>
>>>>> With "-" you are referring to replacing "-->" with "-" to arrive at
>>>>> things
>>>>> like:
>>>>> 15.000-17.950
>>>>> At the left we can see...
>>>>>
>>>>> as compared to:
>>>>> 15.000+2.950
>>>>> At the left we can see...
>>>>
>>>> Yes, that's what I meant.
>>>>
>>>>> I actually think they read fairly given that people are used to the
>>>>> double meaning of "-": to mean both "from ... to" and "minus".
>>>>> But we could use a different character for "absolute time" if you
>>>>> prefer, e.g. "/".
>>>>> 15.000/17.950
>>>>> At the left we can see...
>>>>>
>>>>> I find this fairly readable, too.
>>>>
>>>> Either would work for me. As I mentioned, the room for improvement here
>>>> isn't only the syntax of the timing line, but also to make it obvious
>>>> that
>>>> cue timestamps like <00:01.000> are relative. Using + for relative
>>>> timestamps is potentially confusing too, as one might think that many
>>>> consecutive <+00:01.000> are cumulative, rather than all being 1 second
>>>> from
>>>> the start time of the cue.
>>>
>>> That's true and in fact the way in which I have authored my examples,
>>> now that I look back at them. It makes the timings smaller and I think
>>> it's a bit more logical. But really we just have to decide on one
>>> meaning:
>>>
>>> 5-10
>>> This <+1>is <+1>a <+1>simple <+1>example.
>>>
>>> I find I actually prefer this over
>>>
>>> 5-10
>>> This <+1>is <+2>a <+3>simple <+4>example.
>>
>> Right, we just have to pick something. I'd like to get the basic structure
>> down soon, though, as changing the timestamp parsing will be very difficult
>> once there are implementations.
>
>
> Agreed. Which one would you prefer?
>
>
>
>>>>>>> 7. Comments
>>>>>>
>>>>>>> we recommend the introduction of comments.
>>>>>>
>>>>>> I agree and think it needs to happen before WebVTT starts to get
>>>>>> implemented
>>>>>> and used on the web. In other words: now.
>>>>>
>>>>> Agreed. I'm happy for the previously suggested "//" at the line start
>>>>> to be comments, or, for that matter, "#" or ";" or any other special
>>>>> character. I would prefer not to use "/*" since it implies a "*/" is
>>>>> required to end the comment. Similarly we should avoid "<!--" and
>>>>> "-->" or anything else that requires a special comment end mark and
>>>>> more than one or two characters.
>>>>
>>>> I'd quite like to have block comments, so I think the best options are:
>>>>
>>>> 1. // and /* */ like JavaScript
>>>> 2. <!-- --> like HTML/XML
>>>
>>> If the main use case for the comments is to comment out a line,
>>> something at the line start alone would be sufficient. If we have to
>>> have both, I would prefer the shorter first option.
>>>
>>>> I think that the main difficulty is actually not picking a syntax, but
>>>> deciding how it works in the parser. Unlike HTML, I don't think we want
>>>> the
>>>> comments to show up in the "DOM", since that would only work for
>>>> intra-cue
>>>> comments. Ideally it would be preprocessor-ish, but yet the magic bytes
>>>> ("WEBVTT FILE") should be checked first as otherwise identifying WebVTT
>>>> would require implementing its preprocessor steps :/
>>>
>>> As I would not want the comments not to be handed into the DOM or to
>>> JavaScript, it doesn't matter if they are not like HTML. I would
>>> regard them more as pre-processor style comments.
>
>
> Ups, there was a surplus second "not". :-) I also don't want them
> handed into the DOM.
>
>
>> For simplicity, perhaps it would be better to have line-comments only. On my
>> wishlist I have a less convoluted parser definition which operates on lines
>> instead of sprinkling CR/LF all over, and it'd be easy to add line-comments
>> to such a parser. Wish-list item requested at
>> <http://www.w3.org/Bugs/Public/show_bug.cgi?id=12076>.
>
> I agree. It was with a line-based parsing in mind that I preferred the
> start-of-line comments. I don't really want to make them more
> complicated than that.
>
>
>
>>>>>>> 8. Line wrapping
>>>>>>>
>>>>>>> CEA-708 captions support automatic line wrapping in a more
>>>>>>> sophisticated way than WebVTT -- see
>>>>>>> http://en.wikipedia.org/wiki/CEA-708#Word_wrap.
>>>>>>>
>>>>>>> In our experience with YouTube we have found that in certain
>>>>>>> situations this type of automatic line wrapping is very useful.
>>>>>>> Captions that were authored for display in a full-screen video may
>>>>>>> contain too many words to be displayed fully within the actual video
>>>>>>> presentation (note that mobile / desktop / internet TV devices may
>>>>>>> each have a different amount of space available, and embedded videos
>>>>>>> may be of arbitrary sizes). Furthermore, user-selected fonts or font
>>>>>>> sizes may be larger than expected, especially for viewers who need
>>>>>>> larger print.
>>>>>>>
>>>>>>> WebVTT as currently specified wraps text at the edge of their
>>>>>>> containing blocks, regardless of the value of the 'white-space'
>>>>>>> property, even if doing so requires splitting a word where there is no
>>>>>>> line breaking opportunity. This will tend to create poor quality
>>>>>>> captions.  For languages where it makes sense, line wrapping should
>>>>>>> only be possible at carriage return, space, or hyphen characters, but
>>>>>>> not on   characters.  (Note that CEA-708 also contains
>>>>>>> non-breaking space and non-breaking transparent space characters to
>>>>>>> help control wrapping.)However, this algorithm will not necessarily
>>>>>>> work for all languages.
>>>>>>>
>>>>>>> We therefore suggest that a better solution for line wrapping would be
>>>>>>> to use the existing line wrapping algorithms of browsers, which are
>>>>>>> presumably already language-sensitive.
>>>>>>>
>>>>>>> [Note: the YouTube line wrapping algorithm goes even further by
>>>>>>> splitting single caption cues into multiple cues if there is too much
>>>>>>> text to reasonably fit within the area. YouTube then adjusts the times
>>>>>>> of these caption cues so they appear sequentially.  Perhaps this could
>>>>>>> be mentioned as another option for server-side tools.]
>>>>>>
>>>>>> Yeah, with SRT people are manually line-wrapping when authoring the
>>>>>> captions
>>>>>> and often enough the end result is that you get something rendered:
>>>>>>
>>>>>> - Who could have guessed that not all fonts are the same
>>>>>> size?
>>>>>> - That's news to me, so I get four lines of text where I
>>>>>> wanted two!
>>>>>>
>>>>>> I'm inclined to say that we should normalize all whitespace during
>>>>>> parsing
>>>>>> and not have explicit line breaks at all. If people really want two
>>>>>> lines,
>>>>>> they should use two cues. In practice, I don't know how well that would
>>>>>> fare, though. What other solutions are there?
>>>>>
>>>>> I don't think I would go that far. The concern has mostly been with
>>>>> the line wrapping of lines that are too long and the possibility of
>>>>> splitting words that way. The particular concern was with this
>>>>> paragraph:
>>>>>
>>>>> "Text runs must be wrapped at the edge of their containing blocks,
>>>>> regardless of the value of the 'white-space' property, even if doing
>>>>> so requires splitting a word where there is no line breaking
>>>>> opportunity."
>>>>> see
>>>>>
>>>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/rendering.html#timed-text-tracks-0
>>>>>
>>>>> So we want to avoid splitting mid-word and we suggest introducing the
>>>>> ability to have non-breaking spaces.
>>>>
>>>> I think splitting in the middle of words would only happen for words that
>>>> are longer than the whole line.
>>>
>>> Ah ok - I guess you can interpret the sentence above in this way as
>>> in"splitting a word ONLY where there is no line breaking opportunity".
>>> Then it's probably ok. It would still make sense to accept
>>> non-breaking spaces.
>>
>> Perhaps Hixie would like to clarify in the spec precisely what is meant?
>>
>> There's already a non-breaking space in Unicode: NO-BREAK SPACE (U+00A0)
>
> Ah, ok, that's covered then, too. Good to know, thanks. I was thinking
> about   of course, but we don't need that in a UTF-8 document
> then.
>
>
>>>> There's still plenty of room for improvements in line wrapping, though.
>>>> It
>>>> seems to me that the main reason that people line wrap captions manually
>>>> is
>>>> to avoid getting two lines of very different length, as that looks quite
>>>> unbalanced. There's no way to make that happen with CSS, and AFAIK it's
>>>> not
>>>> done by the WebVTT rendering spec either.
>>>
>>> People split manually when they want quality captions and can visually
>>> test what it will look like.
>>>
>>> This endeavor has one big problem: when you change the video size,
>>> e.g. go to full screen, your optimisation for the previous size is
>>> likely to not be optimal for the new size any more. There, an
>>> automatic line balancing that makes use of commas and "and"s for
>>> choosing likely good line break positions would be nice.
>>>
>>> A completely different situation appears when the captions are not
>>> manually created, as is the case in YouTube. Even when you submit a
>>> perfect transcript and time-align it through speech recognition, you
>>> will only do the line breaks as you have to render cues. To achieve a
>>> better quality there, a better line-break algorithm would help
>>> massively.
>>>
>>> So, I agree with you about improving the line wrapping. I also think
>>> it is likely something that we have to leave to the browsers - at
>>> least for now.
>>
>> Right, some experimentation here would be great, as I haven't seen any
>> feature like this in any media players. In the hope of inspiring someone,
>> perhaps myself, here's how I tentatively would like things to work:
>>
>> 1. Authors are encouraged to not manually line-break
>> 2. UAs render the text at whatever with the <video> container allows, with
>> margins and all
>> 3. The text will have been rendered on n lines.
>> 4. Decrease the width on the container as much as possible while having n
>> lines.
>> 5. Use that line-breaking and then do whatever left/center/right-alignment
>> relative to the original width.
>>
>> I really should get around to reading the rendering section for WebVTT to
>> see what it actually does, perhaps it's already clever...
>
>
> It is quite clever indeed. And now that we have cleared up the line
> breaking issue and the non-breaking space, I think it's as good as
> needs be right now.
>
>
>
>>>>>>> 4. Addressing individual cues through CSS
>>>>>>>
>>>>>>> As far as we understand, you can currently address all cues through
>>>>>>> ::cue and you can address a cue part through ::cue-part(<voice> ||
>>>>>>> <part> || <position> || <future-compatibility>). However, if we
>>>>>>> understand correctly, it doesn’t seem to be possible to address an
>>>>>>> individual cue through CSS, even though cues have individual
>>>>>>> identifiers. This is either an oversight or a misunderstanding on our
>>>>>>> parts. Can you please clarify how it is possible to address an
>>>>>>> individual cue through CSS?
>>>>>>
>>>>>> Since I've been arguing against the id's in WebVTT, I'm curious about
>>>>>> the
>>>>>> use case here. Isn't using a unique class good enough?
>>>>>
>>>>> This links in with the discussion above on CSS styling and classes.
>>>>> Rather than define classes of cue settings and reference them from the
>>>>> cues, this allows them to be applied to individual cues in style
>>>>> sheets. I thought the whole reason of cue identifiers was to have this
>>>>> addressing functionality, so this would just close the loop.
>>>>>
>>>>> For example:
>>>>>
>>>>> Style sheet of the Web page:
>>>>> <style>
>>>>> video track#t1 ::cue(cue10) {
>>>>>  text-decoration: blink;
>>>>> }
>>>>> </style>
>>>>>
>>>>> The Web page (extract):
>>>>> <video src="video.webm" controls>
>>>>>  <track id="t1" label="captions" kind="captions" srclang="en-US"
>>>>> src="cap1.vtt"/>
>>>>> </video>
>>>>>
>>>>> The caption file cap1.vtt:
>>>>> WEBVTT
>>>>> Language=en-US
>>>>> Kind=Captions
>>>>>
>>>>> cue1
>>>>> 0.000-5.000
>>>>> blab blah
>>>>>
>>>>> cue10
>>>>> 40.000-60.000
>>>>> ALERT: Your basement is flooding - evacuate!
>>>>>
>>>>>
>>>>> Cue10 is addressed through CSS and turned into a blinking text without
>>>>> a need to change the markup at all.
>>>>
>>>> My point was that you could just as well do this:
>>>>
>>>> 0.000-5.000
>>>> <c.cue1>blab blah</c>
>>>>
>>>> In my view of things, id's in HTML are primarily for addressing via
>>>> #fragments and as hooks for scripts, for styling class is quite
>>>> sufficient,
>>>> so I'm thinking it would be for WebVTT as well.
>>>
>>> I quite like the idea of using the identifiers for named media
>>> fragment URIs: e.g. http://example.org/video.webm#cue10 . We need
>>> identifiers for this. Also, I find them less intrusive in the text
>>> than <c.cue1> which defines a class that is only every used on this
>>> single cue.
>>
>> Hmm, isn't that what we have chapters for? Or do you want to use id's for
>> some kind of inline chapters?
>
>
> FAICT we would address chapters also by their identifier, so there is
> no difference between the kinds of tracks that we have and the way in
> which we would address into them.
>
>
>>>>>>> 5. Ability to move captions out of the way
>>>>>>>
>>>>>>> Our experience with automated caption creation and positioning on
>>>>>>> YouTube indicates that it is almost impossible to always place the
>>>>>>> captions out of the way of where a user may be interested to look at.
>>>>>>> We therefore allow users to dynamically move the caption rendering
>>>>>>> area to a different viewport position to reveal what is underneath. We
>>>>>>> recommend such drag-and-drop functionality also be made available for
>>>>>>> TimedTrack captions on the Web, especially when no specific
>>>>>>> positioning information is provided.
>>>>>>
>>>>>> This would indeed be rather nice, but wouldn't it interfere with text
>>>>>> selection? Detaching the captions into a floating, draggable window via
>>>>>> the
>>>>>> context menu would be a theoretically possible solution, but that's
>>>>>> getting
>>>>>> rather far ahead of ourselves before we have basic captioning support.
>>>>>
>>>>> On YouTube you can only move them within the video viewport. You
>>>>> should try it - it's really awesome actually.
>>>>>
>>>>> When you say "interfere with text selection" are you suggesting that
>>>>> the text of captions/subtitles should be able to be cut and pasted? I
>>>>> wonder what copyright holders think about that.
>>>>
>>>> Being able to select the captions just like any other text is a great
>>>> thing
>>>> that I wouldn't want to disable. It's very useful if you want to pause
>>>> and
>>>> look up the definition of a word or to report a typo in the captions
>>>> without
>>>> having to retype the whole text.
>>>
>>> I guess you can have all of that as you can have it on Web pages, too.
>>> If you click and hold, it will be grabbing for moving. If you double
>>> click it is text selection for cut and paste. So, I don't think there
>>> would be a problem.
>>
>> That would work, but I have to admit I've never seen a web page/browser
>> combination that does what you suggest. Just single clicking and dragging is
>> certainly the most discoverable form of text selection.
>
> I actually meant: you select a piece of text (either with double click
> or with click and pull and release) and when you click and hold the
> selected text, you can move it. But also, text that is in a block and
> clearly discerned as an entity (e.g. with a line around it or such)
> can often be moved by just clicking on the box/block (outside the text
> itself) and moving the pointer. It's these kind of interactions I had
> in mind.
>
> Cheers,
> Silvia.
>



More information about the whatwg mailing list