[whatwg] the cite element

Ian Hickson ian at hixie.ch
Thu Aug 27 17:08:08 PDT 2009

On Sun, 16 Aug 2009, Benjamin Hawkes-Lewis wrote:
> On 16/08/2009 12:21, Ian Hickson wrote:
> > Italics is the right format for almost all titles of works.
> How are you measuring that?
> For example, chapters in collections and articles are works and have 
> titles, and those titles aren't typically distinguished with italics, at 
> least in English.
> "Titles of works are commonly distinguished from surrounding text" and
> "italics is a common format for many titles of works" would statements that
> would be hard to argue with.

The spec lists these cases:

# a book, a paper, an essay, a poem, a score, a song, a script, a film, a 
# TV show, a game, a sculpture, a painting, a theatre production, a play, an 
# opera, a musical, an exhibition, a legal case report, etc

Of those, all would be typically marked up in italics except maybe games 
and exhibitions.

I'm not saying it's _always_ right. Just that it's right often enough to 
be the default.

On Mon, 17 Aug 2009, Brian Campbell wrote:
> On Jul 19, 2009, at 5:58 AM, Ian Hickson wrote:
> > Certainly there are situation-specific cases where names might be 
> > styled, but I think it's mostly as a side-effect of location rather 
> > than because the text is a name. Consider:
> > 
> > <aside class="testimonial">
> >   <q>Best value for the money!</q>
> >   J. Random User
> > </aside>
> > 
> > <aside class="bookquote">
> >   <q>Best value for the money!</q>
> >   A Random Book
> > </aside>
> > 
> > <aside class="review">
> >   <q>Best value for the money!</q>
> >   Newspaper
> > </aside>
> > 
> > <aside class="logfiles">
> >   <q>[23:02] evaluator: best value</q>
> >   filename.log
> > </aside>
> Hmm. Isn't the common theme here that those names are a source that is 
> being cited (either a work or person)? For many authors, when writing 
> stylesheets to apply to these types of uses, it makes more sense or is 
> easier to have a specific element to style, rather than simply a text 
> node that is a sibling of a <q> and/or a descendent of a particular 
> class of <aside>.

I think these cases would typically be styled in all kinds of ways that 
are going to require class attributes anyway, so the need for a <span> for 
the second part of these examples is a non-issue.

> Earlier, when justifying why you changed the definition of <cite> from 
> HTML 4.01, you said:
> > I don't think it makes sense to use the <cite> element to refer to 
> > people, because typographically people aren't generally marked up 
> > anyway. I don't really see how you'd use it to refer to untitled 
> > works.
> This usage is an example of when people are typographically marked up. 

It's a minor case. The semantic here wouldn't be "name of person", it 
would be "name of person when immediately following a quote in a 
pullquote", which is far too specific to deserve a whole element.

> And there are numerous examples of this use, which seem to contradict 
> this argument:
> > HTML4 actually defined <cite> more like what you describe above; we 
> > changed it to be a "title of work" element rather than a "citation" 
> > element because that's actually how people were using it.
> Among them (selected from some I have run across myself, as well as some 
> from Philip Taylor's data):
> * http://www.webporter.com (from Philip Taylor's data)
>   <cite> is used to mark up the source of a testimonial.

The markup in this case is all sorts of wrong -- e.g. the citation is 
inside the quote -- and, more importantly, the element's style is made 
non-italics, thus completely defeating the entire point of marking up the 
element in the first place.

This page is an argument to not have <cite> cover people's names.

> * http://www.thesentencegame.com/ (from Philip Taylor's data)
>   <cite> is used to mark up the user who wrote or drew a particular piece of
>   content.

Yup, this is one of the very few examples of marking up names with <cite>.

> * http://en.wikipedia.org/wiki/RNA_interference (from Philip Taylor's data)
>   <cite> is used to mark up a full bibliographic citation. Also used on other
>   pages on Wikipedia.

This is a good example of <cite> being more useful if used only for 
titles, given that <cite> is being un-styled then a <span> inside it is 
being restyled to italics. The HTML5 definition would have <cite> only 
used for the italics part, thus making the styling simpler.

> * http://www.igofigure.com/page/testimonials/
>   <cite> is used for the source of a testimonial.
> * http://thelede.blogs.nytimes.com/2009/07/14/running-with-the-bulls-in-pamplona/
>   (and other articles on the NY Times Blogs)
>   <cite> is used to mark up the author of a comment.
> * http://www.w3.org/TR/html401/struct/text.html#h-9.2.1
>   In the very example given in HTML 4.01, <cite> is used to mark up the author
>   of a quote.
> * http://diveintomark.org/archives/2009/04/07/hhgregg-doa
>   <cite> is used to mark up the author of a comment.
> * http://diggingintowordpress.com/ThemePlayground/index.php?wptheme=H5%20Theme%20Template
>   Even some folks who are trying to use HTML5 are using <cite> to mark up the
>   author of a comment; take a look at the comments on one of the example
>   articles.
> * http://microformats.org/wiki/posh-patterns
>   Another recommendation to use <cite> to mark up a person who is the source
>   of a quote (as well as to use <cite> for a bibliographic citation).

Sure, there are examples of people doing this. As you will have seen from 
looking at Philip's data, they are in the minority.

When examining pages, you have to first pick a random sample, then study 
those, because otherwise you get sampling bias. With a trillion pages on 
the Web, it's easy to find thousands of examples of any particular use of 
HTML elements; the question is what is the most useful definition, not 
what is used at all.

> By changing the definition of <cite> in HTML5, you are saying that numerous
> users of the HTML4 definition of <cite> are no longer conforming, and not
> really giving any alternative that does the same job.

<span> does the job fine, in the rare cases where someone really wants to 
mark up someone's name.

> I suppose ideally we would have <cite>, <title> and <author> (among 
> others) that could be nested in such a way as to express exactly what 
> the author means.

Ideally, we'd probably not have any of them, frankly.

> In the absence of that, having <cite> mean simply a source being cited, 
> and allowing the author to determine whether they want to use it for 
> titles of works, authors, or entire citations, seems to be both 
> reasonable and compatible with existing content.

I think having it mean "title of work" only is more useful. Having it mean 
all three will mislead authors into using it for all three, and then cause 
them undue pain as they work around the default styling.

> > What's the alternative? Just say "em, i, cite and dfn mean 'italics'"? 
> > That doesn't seem particularly useful either. Why not just drop all 
> > but <i> if that's what we do?
> > 
> > No, it seems useful to have elements that people can use for specific 
> > purposes, so that style sheets can be shared, so that tools can make 
> > use of the elements, if only in limited circles.
> No, I don't believe that you should remove all mention of semantics that 
> aren't machine checkable from the spec; just that the tightening of the 
> semantics in this case does not seem to be gaining anything (what is 
> actually going to change if people use <cite> only for titles, and 
> resort to spans to mark up authors or full bibliographic citations?), 
> while simultaneously ruling out usages that are currently valid and 
> don't seem to cause any harm.

People are actively overriding the styles <cite> because they think it's 
the right element, but it has the wrong effect. I don't know what more 
harm we could be causing here. The element is failing at its only purpose, 
because people think they're being Semantically Right.

> > Backwards compatibility (with legacy documents, which uses it to mean 
> > "title of work") is the main reason.
> > People who use <cite> seem to use it for titles
> > In the 15 or more years that <cite> has supposedly been used for 
> > citations, I'm only aware of one actual use of that semantic, and that 
> > use has since been discontinued. Meanwhile, lots of people use <cite> 
> > for "title of work".
> You claim that people seem to use it for titles many times, but in 
> practice, while that is the most common use, it is also used to refer to 
> authors or speakers, and sometimes also used for full bibliographic 
> citations. How many sites using <cite> for other purposes, including 
> quite prominent ones, would it take to convince you that this is indeed 
> a common pattern?

A random sample of the Web would need to show more uses of this than uses 
of other things.

How few sites using <cite> for people's names would it take to convince 
you that it _wasn't_ a common case?

On Mon, 17 Aug 2009, Brian Campbell wrote:
> >
> > What is the problem solved by marking up people's names?
> > 
> > Why is this:
> > 
> >   <p>I live with <name>Brett</name> and <name>Damian</name>.</p>
> > 
> > ...better than this?:
> > 
> >   <p>I live with Brett and Damian.</p>
> Has anyone claimed that the <cite> element should be used in such a case?


> The only usage I've seen offered is that the <cite> element may be used 
> to mark up a persons name when that person is the source of a quotation; 
> as in, when you are citing that person (hence, the term "cite").

People have argued that merely mentioning something is a citation.

> > > > Existing behaviour of authors is not to mark up names with <cite>.
> > > 
> > > Except for the authors that do mark up names with <cite>
> > 
> > There are some, but they are not the majority.
> Should only the majority usage ever be allowed?

That is a concern, yes. Another is what is most useful for authors.

> Or if there is another usage, that is somewhat less common, but is still 
> logically consistent, usefully takes advantage of fallback styling in 
> the absence of CSS, and meets the English language definition of the 
> term, should that also be allowed?

Whether it is useful, what problems it solves, and how it works in 
existing implementations are more important concerns than all the ones you 
listed, IMHO.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list