[whatwg] @aria-labelledby | Re: @generator-unable-to-provide-required-alt, figure with figcaption

Wed Jun 19 09:05:21 PDT 2013

Am 17.06.2013 um 22:58 schrieb Ian Hickson:
> On Mon, 17 Jun 2013, Martin Janecke wrote:
>> Am 17.06.2013 um 11:35 schrieb Steve Faulkner:
>>> 
>>> the restriction on figure/figcaption is only in the whawtg spec not 
>>> the W3C HTML spec as it was not deemed a useful or practical 
>>> restriction when reviewed by the HTML WG.
>> 
>> Sounds lovely, this would indeed solve my use case.
> 
> Could you elaborate on what your use case and why it's not handled?

Yes. The use case begins with a markup generator that does not have a suitable alt-text for images. In my case it's actually a converter converting some light-weight markup to HTML, but I don't think the discussion should dive into the details too deep as it applies to other markup generators as well -- I named WYSIWYG editors and automatic digitizers as examples. It is an established fact that there are markup generators that don't have a suitable image description for the required alt attribute.^[1]

Without the required alt-attribute the generator's output is non-conforming or "invalid" markup.

It seems that (or at least markup generator creators seem to think that) a notable amount of users prefer generators that produce output which passes conformance checker tests over those which produce output that gets big red error marks. This can pressure markup generator creators to trick conformance checkers into thinking their output was conforming. Methods to achieve this include using bogus alt-texts or empty alt-texts, suggesting a purely presentational image when it's actually not.

These methods are in a way successful as conformance checkers today fall for the tricks. However, these tricks are considered harmful for accessibility. This motivated defining circumstances under which conformance checkers should silently ignore missing alt attributes.^[2] I'll now go through this list from the WHATWG spec to discuss where they cover and where they do not cover my use case, hopefully making my use case clearer in the process:

(a) "The img element has a title attribute with a value that is not the empty string (also as described above)."^[3]

A title attribute text is usually not available to the light-weight markup converter I maintain. This applies to other markup generators as well: An OCR digitizer does not find something equivalent to the typical implementation of the title attribute (i.e. a mouse-over text) in a book scan.

While my markup generator usually has access to a caption for an image that is immediately visible to anyone and which could theoretically be included as title attribute, that would mean redundant text as in the following example:

<div><img src="tree.jpg" title="Tree in Fantasia"> Tree in Fantasia</div>

I've actually seen captions re-used for title or alt attributes like this quite often in the wild. I do not consider this a desirable output. Why should visually impaired persons have to read everything twice?

---

(b) "The img element is in a figure element that …"^[3] has a figcaption

So now to the question:

> I don't understand why <figure> as defined in the WHATWG spec doesn't work 
> for your case. What does the page look like?

The problem for markup generators is that they do not understand how the pages look like exactly. My light-weight markup converter has to use solutions that work in many cases, favorable and less favorable ones. Again, this applies to other markup generators as well. They don't understand the semantics implied in the text they handle. I'll provide two examples in pseudo code -- the task for the markup generator is to translate it into HTML. Example 1:

| The funny finch is a well known bird of Fantasia.
|
| [fig src="funny-finch.jpg"]Fig 1.: Funny finch on a fig twig[/fig]
|
| It frolics freely in Fantasias famous forests.
|
| [fig src="feeding.jpg"]Fig. 2.: Funny finch feeding a fledgling[/fig]
|
| The funny finch feeds on fruits and flies (fig. 2). Thanks to
| reforestation, the funny finch population has flourished in the past
| fourty years (fig. 3).
|
| [fig src="demographics.png"]Fig. 3: Finch population 1970--2010[/fig]
|
| The funny finch is closely related to the freaky finches of Florida.

The figure element with a figcaption is perfectly suitable for the images in example 1. Let's look at example 2:

| The funny finch is a well known bird of Fantasia.
|
| [fig src="funny-finch.jpg"]Funny finch on a fig twig[/fig]
|
| It frolics freely in Fantasias famous forests.
|
| [fig src="feeding.jpg"]Funny finch feeding a fledgling[/fig]
|
| The funny finch feeds on fruits and flies, as shown in the photograph
| above. Thanks to reforestation, the funny finch population has
| flourished in the past fourty years, which the following diagram
| illustrates.
|
| [fig src="demographics.png"]Finch population 1970--2010[/fig]
|
| The funny finch is closely related to the freaky finches of Florida.

Example 2 is conveying the same message as example 1. They almost look the same. However, while moving all the figures to the bottom of the page won't break example 1, example 2 will suffer badly from it. In example 2 figures are referenced by their location, whereas they are referenced by tokens in example 1.

As I understand the WHATWG HTML spec, the figure element is not suitable in example 2. And hence it is not usable for any markup generators that do not understand human written texts well enough to differentiate between the two cases. Even some people coding HTML by hand will probably have difficulties to do it always right.

The reason why I think example 2 does not conform with the spec is the following paragraph from WHATWG HTML spec 4.5.11:

"The element can thus be used to annotate illustrations, diagrams, photos, code listings, etc, that are referred to from the main content of the document, but that could, without affecting the flow of the document, be moved away from that primary content, e.g. to the side of the page, to dedicated pages, or to an appendix."

The paragraph starts with a "can", indicating options. Then it continues with a "but" clause, which negates the optional character making the following words normative until "e.g." again switches to listing options. The normative part is that figures must be movable away from the primary content without affecting the flow of the document. They don't have to be moved, hence the word "could", but it must be possible without breaking anything. Did I understand that correctly?

My markup generator never knows if a text is written in a way that won't break when moving included figures, so it can never use the figure element.

---

(c) "The conformance checker has been configured to assume that the document is an e-mail or document intended for a specific person who is known to be able to view images."^[3]

Usually, this won't be the case for documents generated by the markup generator.

---

(d) "The img element has a (non-conforming) generator-unable-to-provide-required-alt attribute whose value is the empty string."^[3]

Well, that is an option for any use case a markup generator runs into. But it seems unattractive in all its verbosity to me. Unfortunately -- although its verbosity is there to prevent any misunderstanding for its use -- it might leave the impression that a generator writing

<img src="..." generator-unable-to-provide-required-alt="">

is not as good as a generator writing

<img src="..." alt="an image">

The former is definitely not able, but the latter seems to be able to include an alt attribute. And conformance checkers will tag the latter as valid, too. The latter might actually be worse, but that might not be the impression to the uninformed eye. Still that impression is of importance here. The whole point was to prevent markup generators from being pressured into committing malpractices because (of fear) of negative reactions by their customers.

Please do not make the attribute even more verbose trying to clarify, though.

As said before, my markup generator usually has a caption for an image. I might implement generator-unable-to-provide-required-alt for the rare exceptions where that is not the case. I won't do it as a general solution, though.

Am 17.06.2013 um 22:58 schrieb Ian Hickson:
> On Mon, 17 Jun 2013, Martin Janecke wrote:
>> 

>> I don't know how to assess that the restriction is dropped in a W3C 
>> draft but still included in the WHATWG version, though. Is this 
>> consensus and likely to become standard or still very uncertain 
>> territory?
> 
> The W3C spec is a fork of the WHATWG standard. I can't speak for what 
> they're doing (and this would be the wrong list to discuss it, anyway). As 
> far as the WHATWG spec goes, we try to base decisions on use cases.
> 
> For <figure>, the idea is that it should be reasonable to style <figure> 
> with something like:
> 
>   figure { float: right; }
> 
> If we don't say that <figure>'s contents can be moved in this way, then 
> that becomes much less viable.

Most elements can be moved like that as a purely presentational decision -- when it makes sense. I'm okay with this. My markup generator does not force a particular presentation. An author or webmaster can apply stylesheets to the generated markup. Including a restriction in the HTML spec that the figure element must always be movable without breaking the document's flow is a semantical requirement, though.

Dropping this semantical restriction might make the element less special in one way and therefore less viable for those who use this peculiarity. I do not have insights about which applications rely on this restriction and make active use of it.

Dropping the restriction would allow using the element in other use cases and make it more viable in that respect. Figure with figcaption is nevertheless special then -- it clearly associates an image (or another kind of figure) with an immediately readable caption.

The latter is more useful for my markup generator. To me it is more valuable to be able to associate an image with a caption than to mark an image semantically as movable. Therefore I am happy to hear about the way the W3C HTML version went. Other people might have different priorities, of course, and I cannot judge which is more important.

> If what you want is just an inline image followed by some text, though, 
> you don't need <figure> or title="" -- you can just put in the image and 
> the text, as in:
> 
>   <img src="the image"> <!-- FIXME: replacement text is missing -->
>   <p>More text...

This causes big red error messages in conformance checkers which could in theory motivate users of some markup generators to learn HTML and add lots of alt-attributes to fix all these errors, but in practice leads to markup generators using inappropriate attributes to silence conformance checkers.

In my case it is not applicable anyway: The converter generates markup for instant display -- the output is not saved to be edited.

Think of light-weight markup for posts in an online forum as one application. Forum members shall not be able to edit the HTML directly for security reasons. Even if the light-weight markup would be enhanced to include alt-attributes, the forum members are the ones who would have to write appropriate alt-texts but the forum owner is the one who gets nervous when his site doesn't validate.
It's easier to motivate people not interested in markup to write a meaningful caption that everybody sees than to write a good description almost no one will see. I know, laziness is not a valid argument for missing alt-attributes. But that's not the point here. The point is to prevent an outcome in practice that is worse than a missing alt-attribute.

Regards
Martin

___
[1]: "Markup generators (such as WYSIWYG authoring tools) should, wherever possible, obtain alternative text from their users. However, it is recognized that in many cases, this will not be possible." -- WHATWG HTML spec 4.8.1.1.13
[2]: "This is intended to avoid markup generators from being pressured into replacing the error of omitting the alt attribute with the even more egregious error of providing phony alternative text [...]"  -- WHATWG HTML spec 4.8.1.1.13
[3]: quoted from WHATWG HTML spec 4.8.1.1.14 Guidance for conformance checkers