[whatwg] Introduction of media accessibility features

Thu Apr 15 23:49:38 PDT 2010

On Fri, Apr 16, 2010 at 3:32 PM, Anne van Kesteren <annevk at opera.com> wrote:
> On Thu, 15 Apr 2010 09:59:06 +0900, Silvia Pfeiffer
> <silviapfeiffer1 at gmail.com> wrote:
>>
>> On Thu, Apr 15, 2010 at 3:19 AM, Tab Atkins Jr. <jackalmage at gmail.com>
>> wrote:
>>>
>>> +1 to Henry's suggestion of just using two formats: SRT, and SRT +
>>> (possibly some subset of) HTML+CSS, where the latter is simply a
>>> normal SRT file with HTML allowed where it would normally only allow
>>> plaintext for the caption.
>>>
>>> That seems to be the minimal change that can address this case, and
>>> appears to be a fairly logical extension of an existing widespread
>>> format.
>>
>> A spec would need to be written for this new extended SRT format.
>
> A spec would also need to be written if we go for this new
> TTML-minus-certain-features-and-using-CSS-rather-than-XSL-FO format. That
> would probably be worse since we would be forking an existing format in an
> incompatible way.

No forking - just specifying a mapping of the things that are
supportable. And yes: that needs to be written too.

>> Also, if we are introducing HTML markup inside SRT time cues, then it
>> would make sense to turn the complete SRT file into markup, not just
>> the part inside the time cue. Further, SRT has no way to specify which
>> language it is written in and further such general mechanisms that
>> already exist for HTML.
>
> What general mechanisms are needed exactly? Why is language needed? Isn't
> that already specified by the embedder?

I guess the problem is more with char sets.
For HTML pages and other Web content, there is typically information
inside the resource that tells you what character set the document is
written in. E.g. HTML pages have
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">.
Such functionality is not available for SRT, so it is impossible for a
browser to tell what charset to use to render the content in.

And yes, we have made an adjustment in the Media Associations spec for
<track> to contain a hint on what mime type and charset the external
document is specified in. But that is only a bad fix of SRT's problem.
It should be available inside the file so that any application can use
the SRT file without requiring additional information.

>> I don't think SRT is the right base format to start extending from.
>> That extension doesn't give us anything anyway, since no existing SRT
>> application would be able to do much with it. It is not hard to
>> replicate the SRT functionality in something new. If we really want to
>> do "SRT + HTML + CSS", then we should start completely from a blank
>> page.
>
> It makes things easier for people familiar with authoring SRT. It also makes
> it easier to change existing SRT files into rich SRT files.

The extended SRT file will barely have anything in common with the
original ones. There is more HTML markup to learn than SRT markup. And
having HTML markup encapsulated in a non-html file is just weird.
Also, the numbering through of the captions is honestly not very
useful.

For example:

(1) original SRT file:

---
1
00:00:15,000 --> 00:00:17,951
At the left we can see...

2
00:00:18,166 --> 00:00:20,083
At the right we can see the...
---

(2) possibly new extended SRT file:

---
Content-Type: text/html; charset=UTF-8
Content-Language: en_us
Styles:
{
  div.left-align {
    font-family:Arial,Helvetica,sans-serif;
    text-align: left;
  }
  div.left-right {
    font-family:Courier New, monospace;
    text-align: right;
  }
  div.speaker {
    font-family:Courier New, monospace;
    text-align: left;
    font-weight: bold;
  }

1
00:00:15,000 --> 00:00:17,951
<div class="left-align speaker"><img src="proof.png"
role="presentation" alt="Proog icon"/>Proog:</div>
<div class="left-align" style="color: green;">At the <span
style="font-style:italic;">left</span> we can <a
href="looking_left.html">see</a>...</div>

2
00:00:18,166 --> 00:00:20,083
<div class="right-align" style="color: blue;">At the right we can <a
href="looking_right.html">see</a> the...</div>
---

(3) TTML file: (no hyperlinks, no images - just for comparison)

---
<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="en_us" xmlns="http://www.w3.org/ns/ttml">
  <head>
    <styling>
      <style xml:id="left-align"
        tts:fontFamily="proportionalSansSerif"
        tts:textAlign="left"
      />
      <style xml:id="right-align"
        tts:fontFamily="monospaceSerif"
        tts:textAlign="right"
      />
      <style xml:id="speaker"
        tts:fontFamily="monospaceSerif"
        tts:textAlign="left"
        tts:fontWeight="bold"
      />
    </styling>
    <layout>
      <region xml:id="subtitleArea"
        tts:extent="560px 62px"
        tts:padding="5px 3px"
      />
    </layout>
  </head>
  <body region="subtitleArea">
    <div>
      <p style="left-align" begin="0.15s" end="0.17s 951ms">
        <div style="speaker">Proog:</div>
        <div tts:color="green">At the <span
tts:fontStyle="italic">left</span> we can see...</div>
      </p>
      <p style="right-align" begin="0.18s 166ms" end="0.20s 83ms">
        <div tts:color="green">At the right we can see the...</div>
      </p>
    </div>
  </body>
</tt>
---

(4) possibly new xml/html-ish file:

---
<?xml version="1.0" encoding="utf-8"?>
<xxxx lang="en_us">
<head>
<style type="text/css">
  div.left-align {
    font-family:Arial,Helvetica,sans-serif;
    text-align: left;
  }
  div.left-right {
    font-family:Courier New, monospace;
    text-align: right;
  }
  div.speaker {
    font-family:Courier New, monospace;
    text-align: left;
    font-weight: bold;
</style>

<body>
<cue start="00:00:15,000" end="00:00:17,9510.15">
  <div class="left-align speaker"><img src="proof.png"
role="presentation" alt="Proog icon"/>Proog:</div>
  <div class="left-align" style="color: green;">At the <span
style="font-style:italic;">left</span> we can <a
href="looking_left.html">see</a>...</div>
 </cue>

<cue start="00:00:18,166" end="00:00:20,083">
  <div class="right-align" style="color: blue;">At the right we can <a
href="looking_right.html">see</a> the...</div>
</cue>
</body>
</xxxx>
---

I think (4) is preferable over (2) for the more consistent markup and
actual xml parsability.

Cheers,
Silvia.