[whatwg] many messages regarding image captions

Mon Nov 27 17:49:16 PST 2006

Based on a lot of the feedback, I wrote up a first draft of how to do 
image captions in HTML5:

   <figure>
     <img ...>
     <legend> ... </legend>
   </figure>

It's a block-level element, same level as <p>. The image and the legend 
can come in any order, but there must be exactly one of each. The image 
doesn't have to be <img>, it can be any of <img>, <iframe>, <embed>, 
<object>, and <canvas>. There are special rules for what to do with 
fallback that basically make the caption disappear (though of course this 
won't happen in legacy UAs).

There was a lot of feedback on this issue, I've tried to reply to all the 
points made below. Several of the e-mails basically made the same point or 
suggestion, so I sometimes only replied the first time that a proposal was 
made. Please let me know if I missed anything. If I did it was unintentional.

On Tue, 27 Jun 2006, dolphinling wrote:
>
> What's wrong with
> 
> <div>
>   <img src="">
>   <p>This is the caption.</p>
> </div>
> 
> ? That's how I think of it semantically, and I don't see it as being 
> common enough to warrant a separate element of its own.

I think based on the number of people asking for it, there's enough 
demand!

The problem with the above is that it is exactly the same as:

   <img>
   <p>This is the caption.</p>

...which is (a) non-conforming, and (b) doesn't strongly associate the two 
elements.

On Tue, 27 Jun 2006, dolphinling wrote:
> > 
> > There's nothing "wrong" with that. Yet, that's not exactly an image 
> > caption. There's no explicit association between the <img> and the 
> > following <p>.
> 
> But there's the implicit association given by the fact that they're 
> there, together, in the <div>, and nothing else is. Do you really need 
> anything more than that?

The difference is that with an actual caption, you can distinguish it from 
just a paragraph in the same section, allowing you to draw out more data. 
For example, you could imagine Windows Live Image Search using image 
captions to get a better idea of what the images they index contain.

On Tue, 27 Jun 2006, David Walbert wrote:
> 
> For scripting purposes I have occasionally needed to tie an image and 
> its caption explicitly by giving them related IDs that a particular 
> javascript function can call (e.g. img id="img1" and p 
> id="img-caption"), and it would be helpful in these situations to have a 
> standard syntax for that explicit relation.

Agreed.

> But the availability of such syntax should not imply that everyone needs 
> to use it for all markup, all the time; I can't see it being useful in 
> ordinary web design (to the reader/user, which should ultimately be the 
> point) unless there are actions associated with the image and caption.

Certainly, as with all markup, nobody is going to be forcing authors to 
mark up every last semantic.

On Tue, 27 Jun 2006, Anne van Kesteren wrote:
> 
> If the relationship between the image and caption is well defined. For 
> example, when the next sibling _element_ of <img> is <p>, <p> is the 
> caption of <img> you can easily introduce some DOM attribute on 
> HTMLImageElement named caption that gives you back the 
> HTMLParagraphElement associated with it.

We could do that at some future point, yes.

On Tue, 27 Jun 2006, Mihai Sucan wrote:
> 
> You could also use:
> 
> <div class="image-caption">
> <img />
> <p>caption</p>
> </div>
> 
> However, this looks like a microformat. Nonetheless, it's better than leaving
> out the class attribute.

That's basically the structure that's now in the spec, though with element 
names instead of class names.

On Tue, 27 Jun 2006, Michel Fortin wrote:
> 
> I think something like:
> 
>     <aside>
>        <img src="...">
>        <p>This is the caption.</p>
>     </aside>
> 
> would already be more appropriate than what you suggested as it removes 
> the paragraph from the main flow of the document and thus implicitly 
> link it with the image.
>
> But I think illustrative figures sould be disambiguated a little more, 
> as they are not always appropriate as asides[1] in the sense of 
> "tangentially related to the content around", they are usually tied to 
> the content, they're not really "separate from that content".

Agreed.

> This construct is common enough if you follow news sites. Here is my 
> morning harvest of a couple of use cases in the news:
> 
> <http://www.cnn.com/2006/WORLD/meast/06/27/iran.us.reut/index.html>
> <http://news.bbc.co.uk/2/hi/middle_east/5119732.stm>
> <http://politics.guardian.co.uk/homeaffairs/story/0,,1806799,00.html>
> <http://radio-canada.ca/nouvelles/regional/modele.asp?page=/regions/
> Montreal/2006/06/26/006-crise-logement-iris.shtml>
> <http://www.cbc.ca/story/world/national/2006/06/27/saddam-new-trail.html>
> <http://www.salon.com/ent/feature/2006/06/27/911_conspiracies/index_np.html>
> <http://www.macworld.com/2006/06/firstlooks/flciviv/index.php>
> 
> 
> Such image captions are also widely used in many publications in the 
> scientific and technical fields to denote figures (hence the <figure> 
> element I proposed on May 3):
> 
> <http://www.stanford.edu/group/hopes/basics/basichd/a1.html>
> <http://www.ess.gov.si/eng/AnnaulReport/lp98/2_4.htm>
> <http://www.centrelink.gov.au/internet/internet.nsf/ar0001/chp4.2.htm#figure9>
> <http://www.ai-junkie.com/ann/som/som1.html>
> <http://japanfocus.org/article.asp?id=413>
> <http://medicine.plosjournals.org/perlserv?request=get-
> document&doi=10.1371/journal.pmed.0020228>
> 
> 
> There is a lot of figures in computer technology articles too:
> 
> <http://www.onlamp.com/pub/a/onlamp/2005/01/20/rails.html>
> <http://www.symbian.com/developer/techlib/v9.1docs/doc_source/N10356/
> BuildTools/native/abiv1v2migration.html>
> <http://evonet.lri.fr/CIRCUS2/node.php?node=148>
> <http://developer.apple.com/documentation/GraphicsImaging/Conceptual/
> drawingwithquartz2d/dq_paths/chapter_4_section_2.html>
> <http://www.autodesk.fr/adsk/servlet/index?siteID=458335&id=6013861>
> <http://www-sop.inria.fr/semir/personnel/Laurent.Mirtain/ldap-livre.html>
> <http://www.shirky.com/writings/powerlaw_weblog.html>
> 
> 
> This one is about strings figures:
> 
> <http://www.darsie.net/string/>
> 
> 
> And image captions are an integral part of Wikipedia as you can see on many
> pages:
> 
> <http://en.wikipedia.org/wiki/Amerindians>
> <http://en.wikipedia.org/wiki/Oroonoko>
> <http://en.wikipedia.org/wiki/Earthquake>
> <http://en.wikipedia.org/wiki/Lithosphere>
> ...

Wow! Thank you for this research!

This one in particular:

> http://politics.guardian.co.uk/homeaffairs/story/0,,1806799,00.html

...suggests we may want to have multiple <legend> elements per <figure>, 
to allow for a caption and photo credit to be given. What do people think? 
Would some other way of inline giving photo credit metadata be better?

On Tue, 27 Jun 2006, Mihai Sucan wrote:
> 
> Quick idea:
> 
> <img src="" caption="fig1">
> <p id="fig1">This is the caption.</p>
> 
> How about this? It's backwards compatible. Maybe this has already been
> suggested (apologies if this is the case).
> 
> [...]
> 
> That's why I came with the suggestion of <img 
> caption="caption-element-id" />. It does not require the isolation of 
> the image and caption in another tag. Thus, the markup is less verbose, 
> without losing the benefit of image and caption elements association. 
> Another benefit would be that the <img> and <p> can both be optionally 
> isolated with <aside>, <div> or anything else (if the author considers 
> this required, for styling or other purposes).

Unfortunately, experience with that kind of structure (e.g. <label 
for="">, <object declare>, <td headers="">, <img usemap="">) indicates 
that it is fraught with problems, for authors, implementors, conformance 
checkers, and spec writers. I'd rather avoid this kind of model if 
possible.

On Wed, 28 Jun 2006, Hugh Winkler wrote:
> 
> How about an anchor with rel="caption":
> 
> <a rel="caption" href="#caption1"><img src="x.jpg"/></a>
> <span id="caption1">man bites dog</span>
> 
> or
> 
> <img id="img1" src="x.jpg"/>
> <span id="caption1"><a rel="caption" rev="#img1">man bites
> dog</caption></span>

This seems a bit counter-intuitive, at least compared to <figure>.

On Wed, 28 Jun 2006, Clayton Scott wrote:
> 
> No expanding of the <caption> element with a |for| attribute or nested
> inside any block element?
> 
> <img id='photograph' src='...'>
> <caption for='photograph'>Photo of Foo herders courtesy of Jimmy</caption>
> 
> <caption for='myFavList'>My List of Favourite Things</caption>
> <ol id='myFavList'>
>  <li></li>
> </ol>
> 
> <ol id='myFavList'>
>  <caption>My List of Favourite Things</caption>
>  <li></li>
> </ol>

Unfortunately, <caption> has weird parsing requirements which basically 
mean we can't use it outside tables. But <legend> works (and means the 
same thing, basically), so we can use that.

Regarding the for=""/id idea, see above.

I'll look into the labelling of lists at a separate juncture (it's been 
suggested before, though the use cases for it are a little vague to me -- 
if you have input on this, please start a new thread).

On Wed, 28 Jun 2006, Ben Meadowcroft wrote:
> 
> Perhaps a better method would be using the longdesc attribute to 
> associate a caption with an image. Specifically we could point the image 
> to fragment within the current page and give an explicit association in 
> this manner.
> 
> <img src="man.gif" alt="A Man" longdesc="#manCaption" />
> <p id="manCaption">A more full description of the image</p>
> 
> Does anyone know how screenreaders and other assistive technologies 
> handle longdesc URIs pointing to fragments within the same page?

longdesc was intended for a description, not a caption, so I wouldn't 
like to reuse it in this way.

On Wed, 28 Jun 2006, Michel Fortin wrote:
> 
> Interesting idea, but longdesc cannot be used with <object>, <embed> or 
> embedded XML (SVG for instance). Moreover, a caption isn't always a 
> description.

Indeed.

> Also, if the caption isn't identified as a caption by some mean (say 
> with a "caption" class), styling is not easy and the "caption paragraph" 
> can be confounded with another paragraph else. The common practice of 
> having the illustration and its caption together, either floating on the 
> side, indented, or spaning into the margin cannot be easily realised 
> without additional, non-standard markup. People would then come again 
> with different markups to solve the same problem of displaying a 
> captioned image on screen, and we wouldn't have achieved much.

I agree.

On Thu, 29 Jun 2006, Lachlan Hunt wrote:
> 
> I think accessibility experts would have serious issues with this.  A 
> caption doesn't even come close to being a good long description.  See 
> what Joe Clark has to say about writing good long descriptions.
> 
> http://joeclark.org/book/sashay/serialization/Chapter06.html#h1-1715

Indeed.

On Wed, 28 Jun 2006, David Walbert wrote:
> 
> Agreed, and conversely, a literal description of the image makes a poor 
> caption -- it would be redundant to anyone who can see the image. A good 
> caption enhances the content of the image, usually by telling the 
> reader/viewer/listener why the image was included in the page.

Indeed.

On Wed, 5 Apr 2006, Lachlan Hunt wrote:
> > >
> > > I'm wondering what WA1 considers appropriate markup for a figure 
> > > with a caption.
> > 
> >    <p><img src="image-equivalent-of-text" alt="text" title="caption"></p>
>
> That's fairly limited because it doesn't allow markup within the title 
> attribute.

Agreed.

> What about extending the <caption> element, currently used with <table>, 
> to <img>, <object> and <embed>?

Sadly, this would break table parsing. But <legend> is safe to use, so we 
can use that (and "legend" means "caption" anyway).

On Wed, 5 Apr 2006, Michel Fortin wrote:
> 
> When you read some text, a figure is an illustration of what the text 
> says; it can be an image but is not necessarily: it could be a code 
> snippet, illustrative text or an mathematical equation. Its main 
> characteristic is that it is separated from the flow of the text: you 
> can read the text ignoring figures (and captions) and it should still 
> make sense.
> 
> So what did I just describe? It just seems to me that I've described is 
> the aside element. Would it make sense to simply lay out figures that 
> way?
> 
>     <aside>
>       <h1>Figure 1: Some image found <a href="...">here</a></h1>
>       <p><img src="..."></p>
>     </aside>
> 
> What about placing the image above the header? And is my interpretation 
> of the semantics correct? I'm wondering that I may be abusing aside a 
> little.

I think there are definitely examples of figures that aren't asides. I 
prefer your original proposal.

On Thu, 6 Apr 2006, Alexey Feldgendler wrote:
> >
> >      <aside>
> >        <h1>Figure 1: Some image found <a href="...">here</a></h1>
> >        <p><img src="..."></p>
> >      </aside>
> 
> I'm afraid this won't degrade gracefully: the <h1> would confuse the 
> document outline facilities in today's user agents.

Indeed.

On Thu, 6 Apr 2006, fantasai wrote:
> > > 
> > >      <aside>
> > >        <h1>Figure 1: Some image found <a href="...">here</a></h1>
> > >        <p><img src="..."></p>
> > >      </aside>
> > 
> > I'm afraid this won't degrade gracefully: the <h1> would confuse the 
> > document outline facilities in today's user agents.
> 
> You could use <h6>. Just always use <h6> for the figure caption.
> 
> Actually, I think WA should generally allow the use of <h6> as a 
> terminal-level header, similar to the simplesect element in DocBook. 
> It's a useful markup construct sometimes, at least in my experience.

That would seriously complicate the current definition of how to handle 
headers, which is IMHO quite complicated enough as is.

On Thu, 6 Apr 2006, mozer wrote:
> 
> Proposition 1 :
> -----------------
> And what about just giving to <img> a content ?

This, sadly, wouldn't be very backwards compatible.

> Proposition 2 :
> -----------------------
> or just giving img an id
> then using @for attribute
> <p><caption for="img1">this is <b>the</b> caption</caption><img
> id="img1" src="image-equivalent-of-text" alt="text"></p>

Such structures are, unfortunately, the source of many bugs and confusion; 
I think we should avoid using ID references where we can.

On Thu, 6 Apr 2006, Alexey Feldgendler wrote:
> 
> This heading shouldn't be within the document's main tree of headings. 
> It should be completely taken out, that's what "aside" means. But it 
> can't be done in a backwards compatible way.

Indeed.

On Fri, 7 Apr 2006, Alexey Feldgendler wrote:
> 
> As a user, I wouldn't want Opera to include image captions when 
> navigating headers. I don't feel that the caption is a header -- it's no 
> more of a header than a table caption is.

Agreed.

On Fri, 7 Apr 2006, Alexey Feldgendler wrote:
> 
> Actually, I tend to treat images and tables the same. Tables have 
> <caption>s, and a user agent can make a list of tables for navigation. 
> Why can't an image have a caption? I think images and tables are quite 
> similar.

I agree.

> And I don't think that "heading" is the appropriate semantic entity for 
> marking up captions. Rather than making them headers and at the same 
> time taking measures so that they don't interfere with UA's outlining 
> facilities, I'd rather say that headings should be left entirely for 
> document outline, and captions are marked up explicitly as captions.

Agreed.

On Fri, 7 Apr 2006, Michel Fortin wrote:
> 
> Personally, I can leave with a caption element that doesn't show up in 
> the DOM of legacy user-agents. But given all the attention given to 
> backward compatibility, it just seem a little out of place to ignore 
> such an issue.

Indeed. The even bigger problem is of <caption> inside <td> or <caption>, 
where the parsing is even weirder.

> Yes, that was an accident, and not the first. I'm used to some other 
> lists where I can just hit reply.

Just hit reply-all. :-)

On Wed, 12 Apr 2006, Ben Meadowcroft wrote:
> 
> I'd prefer using a caption element, even if it isn't backwards 
> compatible. However as an alternative why not reuse the label element?

<label> is an option, but it already has quite a lot of requirements 
related to interaction, etc (e.g. clicking it is supposed to transfer 
focus), and it has a for="" attribute that would make this more complex 
than necessary. Basically, we want to avoid overloading elements.

Luckily, <legend> (which previously was just for <fieldset>) is actually 
quite safe to use, since it hasn't had any important implementation 
requirements in the past (with <fieldset>, the requirements rest with 
<fieldset>, not <legend>, generally).

On Wed, 12 Apr 2006, Michel Fortin wrote:
> 
> Apparently, the label element used outside a fieldset has the same 
> problems as caption outside a table (not present in the DOM, impossible 
> to style). At least on Gecko and WebKit.

Indeed. However, since we are already requiring that the <legend> element 
be handled more sanely by the parser, and since <details> is using 
<legend> too, it seems reasonable to use <legend>.

> I've updated my test cases with a figure using the legend element and 
> added a stylesheet to set color of caption and legend elements to better 
> demonstrate that it doesn't work in HTML; with XML it works fine 
> however.
> 
> <http://www.michelf.com/docs/figure.html>
> <http://www.michelf.com/docs/figure.xml>

Thanks, this is indeed helpful.

On Sat, 22 Apr 2006, Simon Pieters wrote:
> 
> HTML+ actually already has markup for image captions[1]:
> 
>   <fig src="image-equivalent-of-text">
>    <caption>caption</caption>
>    text
>   </fig>
> 
> Since no current browser supports the HTML+ <fig> element (AFAIK), we 
> could reuse it for image captions something like this:
> 
>   <fig>
>    <caption>caption</caption>
>    <img src="image-equivalent-of-text" alt="text">
>   </fig>
> 
> [1] http://www.w3.org/MarkUp/HTMLPlus/htmlplus_35.html

Except for the parsing problems with <caption>, and the use of <figure> 
instead of <fig>, I agree, and this is what has now been added to the 
spec. (I don't think it makes sense to abbreviate <figure> particularly.)

On Wed, 3 May 2006, Michel Fortin wrote:
> 
> I just found out that at least one important HTML parser (Gecko) doesn't 
> create the right DOM tree out of something as simple as this:
> 
>     <section>
>       <p>Some paragraph</p>
>     </section>
> 
> The tree will be as if you wrote this:
> 
>     <section>
>       </section><p>Some paragraph</p>
> 
> Why? Because unknown elements are considered inline and won't accept any 
> block element within. This problem applies to <nav>, <header>, <footer>, 
> and <aside> too (and maybe others).

Indeed, this behaviour is considered wrong by the HTML parser required by 
HTML5.

> The point I want to make is that because simple elements like section 
> aren't really backward-compatible, there is probably no point in 
> requiring that from image captions either. Hence it could be acceptable 
> to allow the following markup even if current HTML parsers are ignoring 
> caption:
> 
>     <figure>
>       <caption>Some image</caption>
>       <img src="...">
>     </figure>

The problem isn't that UAs handle the above badly, it's that they are 
required to handle situations such as the following in ways that aren't 
compatible with the proposal:

   <table>
    <tr>
     <td>
      <figure>
       <caption>

(The <caption> ends the <td>.)

> Block-level element, and structured inline-level element.

I couldn't really work out why it should be inline. It seems equivalent to 
a <p>.

> Content model:
>     Zero or one caption element followed by inline-level content.

Why not just embedded elements?

> I've chosen an inline-level content model because it allows not only 
> img, but also structured inline-level elements like pre. I'm not so sure 
> about that choice however.

Hm, <pre> seems like an interesting thing to put in a <figure>. What do 
people think? Right now it's only embedded content.

On Thu, 29 Jun 2006, Lachlan Hunt wrote:
> 
> I agree that structure is the best approach.  That allows for good 
> styling, by setting the following:
> 
> figure { display: table; caption-side: bottom; }
> figure img { display: block; }
> caption { display: table-caption; }
> 
> The only problem is that it isn't very backwards compatible.  Firefox 
> doesn't include <caption> in the DOM outside of a table.  Moving the 
> caption after the image in the source and setting display: block; on the 
> image gives reasonable results in Firefox and Opera because the img is a 
> child of figure, but not in IE because figure and img are treated as 
> siblings.

For anything with new elements to work in IE, we're going to have to rely 
on JavaScript to wrap the elements around.

> Using the microformat approach instead, like the following, gives better
> backwards compatibility, but at the expense of proper semantic elements.
> 
> <div class="figure">
>   <img ... >
>   <p class="caption">Figure 1: Some image</p>
> </div>
> 
> We could also use a single-cell table for this, but this approach could be
> considered abuse and IE doesn't seem to support 'caption-side', but does
> support the deprecated align attribute on caption.
> 
> <table class="figure">
>   <caption>Figure 1: Some image</caption>
>   <tr>
>     <td><img ... ></td>
>   </tr>
> </table>

Yeah, those are probably what people will have to keep doing while we wait 
for <legend> and <figure> to be better implemented.

> Whatever approach we eventually decide upon, it should be able to handle
> captions for a variety of strucutres, not just images.  This includes:
> * Blocks of code and sample input/output
> * Lists
> * <object>, <embed>, <img> and maybe <iframe>

Interesting. At the moment it only allows embedded content (<object>, 
<embed>, <img>, <iframe>). I certainly could see us also allowing <figure> 
to label <pre><samp> and <pre><code>. People often ask for labelling 
lists, which I guess is a figure of sorts too, so <ul> and <ol> might make 
sense. What do people think? Should we give a list of elements that can be 
used there? Or just make it any block-level element? The latter seems a 
bit vague... it's not clear what

   <figure>
     <p>...</p>
     <legend> ... </legend>
   </figure>

...means.

What should we do with the rules for handling fallback if we allow <pre> 
et al? I suppose in those cases, there simply isn't any fallback, but what 
if the <object> element falls back to an <ol>?

   <figure>
     <object ...> <ol> ... </ol> </figure>
     <legend> ... </legend>
   </figure>

Currently that's either an object and its legend, or, a list; but if we 
also allow:

   <figure>
     <ol>...</ol>
     <legend> ... </legend>
   </figure>

...then it would make sense that the <figure> would fall back to that. But 
then how do you distinguish that case from the case where it doesn't want 
to fall back to a list-with-a-legend?

On Thu, 29 Jun 2006, Michel Fortin wrote:
> 
> Then I realised that, contrary to my first belief, others elements in 
> the current draft -- namely <section>, <nav>, <header>, <footer>, 
> <aside>, and maybe others -- aren't backward compatible either because 
> they can't contain even a paragraph, at least in Safari and Firefox.
> 
> Will this prevent <section> and others from being adopted? I doubt it. 
> Should this prevent a <figure> element from being adopted? I don't think 
> so.

Agreed.

On Wed, 1 Nov 2006, Michel Fortin wrote:
>
> I see no reason to be restrictive on the kind of content that can be 
> captioned.

Well, we want the semantics to be well-defined. It's not clear to me what 
the semantics will be in all cases if we allow anything to be captioned.

On Wed, 1 Nov 2006, Jonathan Worent wrote:
> 
> Taking cues form the label element for forms you could either make the association explicit by
> wrapping the caption around the element its captioning
> <caption>
>    <embed ...>
>    A funny video of a man being hit in the groin by a football
> </caption>
> 
> or make the association implicit by using the for attribute
> <embed id="funnyVid" ...>
> <caption for="funnyVid">A funny video of a man being hit in the groin by a football</caption>

I don't really understand the use case for associating the caption in this 
way.

On Sat, 4 Nov 2006, Matthew Paul Thomas wrote:
> > >
> > > or make the association implicit by using the for attribute
> > > <embed id="funnyVid" ...>
> > > <caption for="funnyVid">A funny video of a man being hit in the groin by a
> > > football</caption>
> 
> That would work for the current page layouts of YouTube and Google Video.
> 
> > I think what would work best for this is the <figure> element I've proposed
> > back in june:
> > 
> >     <figure>
> >       <caption>Some caption here</caption>
> >       ...
> >     </figure>
> > ...
> 
> That would not. (At least, not without some tricky CSS.)

Could you elaborate on that? I don't really understand why you think that. 
Unless you mean just because of the order, but we could easily just allow 
the caption to go at the end of the <figure> element.

On Thu, 9 Nov 2006, Jeff Seager wrote:
>
> What's clearly missing from the IMG specification is an appropriate 
> means for pairing each picture or graphic with a caption. [...]

Agreed.

On Fri, 10 Nov 2006, Michel Fortin wrote:
> >
> > The difference is that <caption> will never work, because of things 
> > like this:
> > 
> >    <table>
> >      <caption>
> >         <figure>
> >            <img ...>
> >            <caption> ...A... </caption>
> >         </figure>
> >      </caption>
> >      ...
> >    </table>
> > 
> > ...which, for legacy compatibility reasons, must result in a DOM where the
> > text with "A" ends up in a second <caption> element that is a child of the
> > <table> element.
> 
> I don't get it. Are you saying that <caption> cannot work outside <table>
> because it has to work a certain way when inside a <table> element?

Yes. See also the examples earlier in this e-mail.

> Or are you simply saying that <figure> cannot work because it cannot 
> work inside a table caption?

Um yes. That's the same thing. :-)

> If supporting <figure> inside a caption is so important, browsers could 
> treat <figure> in the same way they treat <table> when inside a caption. 

I really don't want to change the table parsing model. :-)

> But any of these two samples seems completely ridiculous and confusing 
> to me. Personally, I don't think any of the above cases should be 
> allowed (caption has inline-level content in HTML4 by the way), and I 
> it'd also be fine if browsers continue to do whatever they do when 
> inside a <caption> element.
> 
> And I don't see how any of this could prevent <caption> from creating a 
> caption element in the DOM when *outside* a table.

It would have the same problem in a table cell, and there's a lot of 
content in table cells, whether we like it or not. :-)

> I'm not sure I like "legend" as a word for captions. A legend -- in the 
> context of a graph, a map, or a schema -- is a list of symbols or colors 
> followed by some definition of what they represents on the figure. It's 
> far different from a caption or the title of a figure.

Actually, <legend> basically just means <caption>. The case you mention is 
a specific type of <legend> (also known as a key).

> But <legend>, as an element, is worse: image captions and table captions 
> are much more similar in concept and in default presentation than 
> fieldset legends, which are some kind of title for a thematic group of 
> form controls.

The way <legend> is defined now it's just a general captioning-like 
element to be used in various places.

> By the way, I think <legend> for <details> is a perfect choice, because 
> like <fieldset>, <details> is a functional regroupment, so <legend> 
> becomes some sort of title for a group of related user interface 
> elements. That definition works for both <fieldset> and <details>. If 
> you add <figure> into the mix, <legend> becomes a title for something on 
> the page. I think this would dilute the semantics and make the language 
> less coherent.

Well, it just means you get to work out the semantics from the parent 
element instead of from just the tag name. It's not really diluted.

> Even if it's a little more difficult, I think using <caption> is the 
> right thing to do.

As far as I can tell, it's more than a little more difficult.

On Sat, 11 Nov 2006, Matthew Paul Thomas wrote:
> 
> Anyway, I support the idea of a caption *element* to accompany images. This
> would have two benefits over an attribute:
> 1.  It could contain markup, which an attribute cannot.
> 2.  With a for= attribute, it could apply to an image elsewhere in the
>     document, which would be useful for the print medium. For example:
>         <p>
>           <legend for="classphoto"><i class="printonly">Top left:</i>
>           The class of 2006.</legend>
>           <legend for="bandseniors"><i class="printonly">Top right:</i>
>           Simone with her parents on graduation day.</legend>
>         </p>
>     (For the screen medium, ideally UAs would place a caption adjacent
>     to the relevant image, regardless of where the caption occurred in
>     the document.)

That's an interesting use case. In general, though, I'd say that the 
content shouldn't assume the location (e.g. by mentioning it in the 
markup). That should be generated content. No?

> I suggest that this element behave in the opposite way from alt=: 
> whereas alt= should be presented only if the image itself is *not* 
> presented, the caption element should be presented only if the image 
> itself *is* presented. Otherwise there would be the same problem with 
> non-sequiturs in non-visual media as there is with descriptive alt=.

Agreed; spec now requires this. Not sure how to make this jive with the 
idea of allowing <pre>/<ol>/etc, though; see above.

On Mon, 13 Nov 2006, Matthew Raymond wrote:
> 
>    I was actually thinking of something like this:
> 
> | <figure>
> |   <img id="imageid" [...]>
> |   <label for="imageid">
> |     Image caption text.
> |   </label>
> | <figure>

The for/id is redundant in this example.

>    ...Where fallback content is ignored by <figure>:
> 
> | <figure>
> |   <table>
> |     <tr><td>
> |       <img id="imageid" [...]>
> |     </td></tr>
> |     <tr><th>
> |       <label for="imageid">
> |         Image caption text.
> |       </label>
> |     </th></tr>
> |   </table>
> | <figure>
> 
>    So, in the above, the UA would treat the second example as if it 
> where the first.

That seems a bit complex just to handle fallback, though. (I'm not sure 
how it would work with CSS, either.)

On Mon, 13 Nov 2006, Jeff Seager wrote:
> 
> I apologize if I've gone over old ground on this. I'm new to this list, 
> and others have indicated that this has been discussed before. Has it 
> been decided, though? To me, it seems a very basic and urgent need in 
> the HTML/XHTML specs.

All feedback is welcome!

On Tue, 14 Nov 2006, Alexey Feldgendler wrote:
> 
> I believe HTML should have an element for every attribute intended to 
> hold human-readable text. A raw idea can go like this:
> 
> <img id="img1" src="...">
> <label for="img1" type="title">...</label>
> 
> Here, <label> holds a value which should be treated the same way like 
> the title attribute on <img>, except that it can contain nested markup. 
> This would be useful for all attributes defined as %Text in HTML -- in 
> HTML4, these are ABBR, ALT, LABEL, STANDBY, SUMMARY, TITLE. However, 
> doing a full straightforward solution like this may be bad for backward 
> compatibility, especially in the case of ALT. But the idea is: whenever 
> we write an attribute of type %Text, we want text with markup, so an 
> element instead of attribute is needed.

I think that leads to far too many complications. Attributes have nice 
properties, in that they are very simple to deal with. Elements introduce 
all kinds of complications. For many things, e.g. tooltips, you really 
don't want complications.

On Sun, 19 Nov 2006, Andrew Fedoniouk wrote:
> |
> |   I don't see the point of replacing attribute-based tooltips with
> | elements. Many platforms/OSes don't have tooltips that support anything
> | other than text. If you really want to replace text tooltips with
> | elements that support HTML content, you might as well go all the way and
> | allow people to use it as a means of creating popup content. Once you go
> | down that road, people will be asking for it anyways.
> 
> "...Many platforms/OSes don't have tooltips that support anything
> other than text...."
> 
> How task of tooltip creation in HTML is related to what OS is using for
> tooltips?

It's quite important that HTML not violate platform conventions. Being 
consistent with the OS is one of the most important usability concerns of 
writing applications.

> As an example: here is a screenshot of HTML/CSS defined
> tooltip in HTMLayout:
> http://www.terrainformatica.com/htmlayout/images/tooltip-balloon.jpg
> as you may see this is far from what OS is using.

It's also quite horrifying from a UI perspective. :-) But that's a topic 
for another mailing list, as do the various XBL and CSS proposals that 
spawned from this particular part of the caption thread (and which I have 
therefore skipped here -- please let me know if there was something 
on-topic that I missed and that I should reply to; I read all the e-mails 
in question but didn't see anything that I should reply to).

On Wed, 22 Nov 2006, Michel Fortin wrote:
> 
> So I propose a new <fcaption> elements -- for "figure caption" -- in
> replacement for the <caption> element in my previous figure construct:
> 
>     <figure>
>       <fcaption>Caption Text</fcaption>
>       <img src="...">
>     </figure>

I really want to avoid introducing new elements for concepts we already 
have... We already have four or so ways of doing "caption"-like things: 
<caption>, <legend>, <label>, and title="", not to mention the <hx> 
elements, <th>, and <title>, which are similar enough that people on this 
thread have sometimes mentioned them. The more we introduce the more 
confusing the language becomes.

On Wed, 22 Nov 2006, Alexey Feldgendler wrote:
> 
> <figure> cannot be used like this:
> 
> <table>
>    <thead>
>      <tr>
>        <th>Painting</th>
>        <th>Title</th>
>        <th>Author</th>
>      </tr>
>    </thead>
>    <tbody>
>      <tr>
>        <td><img id="img1" src="..."></td>
>        <td><label for="img1" type="title">Mona Lisa</label></td>
>        <td>Leonardo da Vinci</td>
>      </tr>
>      ...
>    </tbody>
> </table>

This is a table, not a set of figures, so you would do:

 <table>
    <thead>
      <tr>
        <th>Title</th>
        <th>Painting</th>
        <th>Author</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <th>Mona Lisa</th>
        <td><img id="img1" src="..."></td>
        <td>Leonardo da Vinci</td>
      </tr>
      ...
    </tbody>
 </table>

...or some such (note the use of <th>).

IMHO, anyway.

On Wed, 22 Nov 2006, James Graham wrote:
> 
> Another issue to consider is the possibility of multiple images with a single
> caption (this is very common in scientific papers, print magazines, etc.). A
> construct like
> <figure>
> <img>
> <img>
> <img>
> <imgcaption>
> </figure>
> might be enough to support this (the details are, I think, non-trivial);
> something that requires the caption to point to exactly one image cannot.

Hm. This is currently not possible either. I didn't see many (any?) 
examples of this in the list of sample pages that Michel provided, though, 
so I'm not sure we need to address this in the first version. What do 
people think?

On Wed, 22 Nov 2006, Steve Runyon wrote:
> 
> One minor point I would clarify: Alexey, you stated that <label for="XX" 
> type="title"> would replace the "title" attribute.  I assume you meant 
> that it should *supplement* it, since you wouldn't want to preclude its 
> use or mess with backward compatibility.
> 
> It sounds like <label for="XX" type="title"> would be a *terrific* 
> addition to HTML5, along with a new value for the "display" property, 
> "tooltip". (I'm thinking of all the JS that I wouldn't have to write 
> anymore! :-)

The complexity would be moved to the implementations, though, and the 
proposal here seems complex enough that I could easily imagine browser 
vendors never getting it right enough for authors to use it reliably.

On Wed, 22 Nov 2006, Alexey Feldgendler wrote:
> > 
> > and I think the table makes the association pretty clear by itself.
> 
> It's not clear for Google Images which needs to extract (image, title) 
> pairs from documents.

With the use of <th>, it seems like it would be.

Thanks to everyone who took part in this discussion (bcc'ed). Obviously I 
haven't been able to do what everyone suggested since there were many 
contradicting proposals! Please send feedback on <figure> and company. 
Hopefully having a concrete proposal in the spec will help focus the 
discussion going forward.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'