[whatwg] Introduction of media accessibility features

Jonas Sicking jonas at sicking.cc
Wed Apr 14 14:05:15 PDT 2010

On Wed, Apr 14, 2010 at 10:19 AM, Tab Atkins Jr. <jackalmage at gmail.com> wrote:
> On Tue, Apr 13, 2010 at 11:33 PM, Silvia Pfeiffer
> <silviapfeiffer1 at gmail.com> wrote:
>> On Wed, Apr 14, 2010 at 1:28 PM, Robert O'Callahan <robert at ocallahan.org> wrote:
>>> On Mon, Apr 12, 2010 at 12:47 PM, Silvia Pfeiffer
>>> <silviapfeiffer1 at gmail.com> wrote:
>>>> Understood. But what is actually the cost of implementing all of TTML?
>>>> The features in TTML all map onto existing Web technology, so all it
>>>> takes is a bit more parsing code over time.
>>> When implementing one complex spec (TTML + XSL-FO) in terms of another
>>> complex spec (HTML + CSS), you have to be very very lucky to find that all
>>> the features map perfectly, even if the specs were designed to work together
>>> that way, which in this case they are not. Even if you're lucky today,
>>> evolution of the specs could easily accidentally break things.
>> I believe it is possible today, but of course cannot prove it right
>> now. Also, future evolution of TTML will be directed by the Web in the
>> future if it gets integrated, as it would be its main use. Also: it's
>> a W3C standard, so probably would have the requirement not to break
>> the Web. So, I don't buy that latter argument. But I guess until there
>> is a mapping for all of DFXP, there probably aren't enough facts to
>> support/reject DFXP.
> I'd rather not be in charge of keeping them aligned perfectly.  I'd
> also never want to be put in a situation where someone objects to a
> useful change in CSS because it doesn't work for TTML.  Just
> integrating CSS and SVG is a pain, and there's measurable *benefit*
> there.

W3C has a long history of creating incompatible specs, so I would not
rely on them to "not break the web". Just look at all the
incompatibilities between SVG and CSS (unitless lengths for example),
SVG and DOM ('evt' vs 'event'). And this is still going on (XHTML WG
reusing both the 'text/html' mimetype and the XHTML 1.1 namespace for
an incompatible XHTML2 language).

I would be very surprised if XSL:FO is compatible with CSS. I think
there was little to no cooperation between the two efforts.

>>> We could make that problem go away by normatively defining something that
>>> looks like TTML in terms of a translation to HTML + CSS. It wouldn't really
>>> be TTML though, and where's the added value for authors?
>>> I understand the deep political problems here, but I think it's most logical
>>> for styled content for the Web to use (possibly a subset of) HTML and CSS.
>>> Server-side tools to translate between TTML and HTML+CSS would be one way to
>>> address the desire to interoperate with TTML.
>> I personally have no issue with introducing a new format - even
>> experimented with one some time ago, see
>> http://wiki.xiph.org/Timed_Divs_HTML . There are challenges with this,
>> too, and they are not only political. For example, I think we would
>> need to restrict some things from appearing in timed sections: e.g.
>> would we really want to allow multiple layers of video rendered on top
>> of video? But we could develop a new format - just like we have
>> developed microdata.
>> Introducing a new format would indeed be mainly a political problem.
>> Not just would it go "against" all other existing formats. It would
>> also be a challenge to get other applications to support it, in
>> particular applications that do not contain a Web framework.
>> Thus, my thinking was that what we do internally is basically HTML+CSS
>> on time sections. And all formats that we read from externally will be
>> mapped to that. We would already do that with SRT, too.
> +1 to Henry's suggestion of just using two formats: SRT, and SRT +
> (possibly some subset of) HTML+CSS, where the latter is simply a
> normal SRT file with HTML allowed where it would normally only allow
> plaintext for the caption.
> That seems to be the minimal change that can address this case, and
> appears to be a fairly logical extension of an existing widespread
> format.

I like this approach, though I wonder how it's intended to attach a
stylesheet to the SRT+HTML file?

An alternative approach is to simply use HTML+microdata. I.e. take an
HTML file, and add microdata to describe which element should be
displayed when.

Of course, even better would be to have a markup language for marking
up the meaning of the timed text. For example, it's unfortunate that
the DFXP markup contains  things like

[Water dropping]<br/>
[<span tts:fontStyle="italic" tts:color="lime">plop, plop, plop, …</span>]

Where the brackets clearly mean that the contained text isn't being
said, but that they are sound effects. This would be much better done
with markup like:

<descriptive>Water dropping</descriptive>
<soundeffect>plop, plop, plop</soundeffect>

It strikes me as interesting that DFXP is as complicated as it is,
without solving, what at least I perceive as, this very basic need.

On a separate note, I note that the DFXP file seems to be specific to
a specific size of the video. If I resize the video, the captions that
go on top of the video doesn't move appropriately. This could very
well simply be due to this being a demo. Or due to a bug in the
"implementation". Or a simple mistake on on the part of the author of
the specific DFXP file.

/ Jonas

More information about the whatwg mailing list