[whatwg] The IMG element, proposing a CAPTION attribute

Sun Nov 26 13:57:39 PST 2006

HTML is made up of 5 atoms:
div
span
col
tr
td

The web browser converts the img element into:
<div style='content:url(image)'></div>

Conversely, if you add a caption, it has to generate this:
<div>
<div style='content:url(image)'></div>
<div style='_$caption'>caption</div>
</div>

Needless to say, its really just setting some attributes or properties, 
maybe a bitmask or two to indicate the atom, or if it is a more modern 
implementation, it might be attaching some function pointers (delegates) to 
handle the behavior of the atom.

To the web browser the caption is, indeed, presentational, as a result.  If 
the HTML standard only had to deal with web browsers, if it only had to 
worry about what Gecko and Opera want, the story would end there.

However, the web browser is not the only user-agent.

Web browsers don't care about structural versus presentational markup.  They 
care about structure precisely to the point that it triggers CSS rules, and 
no farther than that.  Some might infer things based on errors in the 
tagsoup.  (and most people in the list probably don't remember what the web 
was like with gopher and the first generation of HTTP browsers, where a 
markup error would actually crash the browser -- and sometimes the host OS 
along with it)

Indexing services, on the other hand, care only about the relationships 
between data.  They want to form cross reference tables that they can use to 
implement features such as search engines.

To an indexing service, the caption is the single most important thing about 
an image.  By separating the caption from the IMG element, you force the 
search engine to apply a heuristic of some variety to infer the connection.

Consider a page of thumbnails with captions, for example, being indexed by 
Google.  Google needs to know what caption belongs to what thumbnail.  This 
is trivial if caption is an attribute, child element or has an IDREF 
association with the image.  In any other scenario, the markup that has to 
be handled is diverse.

I mean, the images could be floated divs with the caption in the div.  They 
could be td elements, with a separate td element in the next row for the 
caption, they could be position:absolute with another position:absolute 
element somewhere else in the document positioned where some GUI tool put 
it.

The indexing service user agent has to make sense of all of this, in order 
to figure out what caption goes with what image, and it is just going to be 
extremely difficult to get that with no actual structural relationship 
between the caption and the image.

I don't think it matters if it is an attribute, a child element, or a 
separate element associated via an IDREF, but one of those things must 
happen in order to maintain the structural relationship, so that an indexing 
service can leverage that to provide better cross references, and ultimately 
better search engine results.

----- Original Message -----
From: "Michel Fortin" <michel.fortin at michelf.com>
To: "Alexey Feldgendler" <alexey at feldgendler.ru>
Cc: "WHATWG List" <whatwg at whatwg.org>
Sent: Thursday, November 23, 2006 7:43 AM
Subject: Re: [whatwg] The IMG element, proposing a CAPTION attribute

Le 23 nov. 2006 à 3:32, Alexey Feldgendler a écrit :

> Anyway, "caption" is presentational.

Oh, please. If "caption" is presentational, then "paragraph" and
"table" are as much, if not more. According to my dictionary:

paragraph
     a distinct section of a piece of writing, usually dealing
     with a single theme and indicated by a new line,
     indentation, or numbering.

table
     a set of facts or figures systematically displayed, esp.
     in columns.

caption
     a title or brief explanation appended to an article,
     illustration, cartoon, or poster.

If there is a definition in this list which doesn't suggest some kind
of visual presentation, it's the caption. Surely you have a different
definition than me.

The semantic relation between a caption and its image, or figure,
should be exactly what is defined above: "a title or a brief
explanation".

(Definitions from the New Oxford American Dictionary, 2nd edition)

Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/