[whatwg] Google Feedback on the HTML5 media a11y specifications

Mon Jan 24 03:28:36 PST 2011

On Mon, Jan 24, 2011 at 4:32 AM, Philip Jägenstedt <philipj at opera.com>
wrote:
> Wouldn't a more sane approach here be to have each language in its own
file,
> each marked up with its own language, so that they can be enabled/disabled
> individually? I'd certainly appreciate not having the screen cluttered
with
> languages I don't understand...

Personally I'd prefer that, but it would require a good deal of metadata
support--marking which tracks are meant to be used together, tagging
auxilliary track types so browsers can choose (eg. an "English subtitles
with no song caption tracks" option), and so on.  I'm sure that's a
non-starter (and I'd agree).

A much more realistic method would be to mark the transcription cues with a
class, and enabling and disabling them with CSS.

> More generally, I kind of doubt any solution we come up with will be good
> enough for the most hardcore fansubbers, as they obviously think they need
> pixel-perfect control of everything -- an anti-goal when separating
> semantics from presentation, as WebVTT tries to do. So either they have to
> use pre-rendered captions (boo!), or use a crazy format that is especially
> tailored to anime fansubbing (it already exists).

I don't know if they all think they need that, but maybe the set of people
who think they need pixel-perfect control is the same set of people who
clutter the screen with transcriptions of songs.  (Either way, since I don't
actually like subtitles doing this, I'm not inclined to argue on this use
case's behalf very hard.)

> (Also, we're not going to see <video><track> used for anime fansubbing on
> the public Web until copyright terms are shortened to below the attention
> span of anime fans.)

Maybe so.  I don't know if professional subtitles ever do this.  I'm
guessing (and hoping) not, but I'll ask around as a data point--they've
taken on other practices of fansubbers in the past.

> Yeah, the monospace Latin glyphs in most CJK look pretty bad. Still, if
one
> wants really fine-grained font control, it should already be possible
using
> webfonts and targeting specific glyphs with <c.foo>, etc.

I don't think you should need to resort to fine-grained font control to get
reasonable default fonts.  If you need to specify a font explicitly because
UAs choose incorrectly, something has gone wrong.  It doesn't help if things
are expected to work without CSS, either--I don't know how optional CSS
support is meant to be to WebVTT.

The above--semantics vs. presentation--brings something else to mind.  One
of the harder things to subtitle well is when you have two conversations
talking on top of each other.  This is generally done by choosing a vertical
spot for each conversation (generally augmented with a color), so the viewer
can easily follow one or the other.  Setting the line position *sort of*
lets you do this, but that's hard to get right, since you don't know how far
apart to put them.  You'd have to err towards putting them too far apart
(guessing the maximum number of lines text might be wrapped to, and covering
up much more of the screen than usually needed), or putting one set on the
top of the screen (making it completely impossible to read both at once,
rather than just challenging).

If I remember correctly, SSA files do this with a hack: wherever there's a
blank spot in one or the other conversation, a transparent dummy cue is
added to keep the other conversation in the correct relative spot, so the
two conversations don't swap places.

I mention this because it comes to mind as something well-authored,
well-rendered subtitles need to get right, and I'm curious if there's a
reliable way to do this currently with WebVTT.  If this isn't handled, some
scenes just fall apart.

-- 
Glenn Maynard