[whatwg] the cite element

Ian Hickson ian at hixie.ch
Wed Aug 12 16:21:23 PDT 2009


On Mon, 3 Aug 2009, Erik Vorhes wrote:
> On Mon, Aug 3, 2009 at 6:29 AM, Ian Hickson <ian at hixie.ch> wrote:
> > Not all titles are citations, actually. For example, I've heard of the 
> > /Pirates of Penzance/, but I'm not citing it, just mentioning it in 
> > passing.
>
> No, that actually is a citation, whether you realize it or not. You are 
> making reference to a musical and are therefore citing it, even in 
> passing.

Your definition of "citation" is far looser than my dictionary's ("a 
quotation from or reference to"). In fact your definition seems to be 
basically the same as HTML5's -- a title of a work. Unless you think that 
this should be valid use of <cite>:

   <p>I picked up <cite>my favourite book</cite>, and put it next to 
   <cite>the painting I got from my aunt</cite>.</p>

I don't think that those references to works should use <cite>. Doing so 
has zero benefit, as far as I can tell.


> > > See <http://www.four24.com/>; note near the top of the source: 
> > > <blockquote id="verse" cite="John 4:24">...
> >
> > My statement stands, on the aggregate:
> >
> > On Mon, 27 Jul 2009, Philip Taylor wrote:
> > >
> > > See http://philip.html5.org/data/cite-attribute-values.txt for some 
> > > data. (Looks like non-URI values are quite rare.)
> 
> I agree that @cite is rarely used as anything other than a URI; I was 
> attempting to demonstrate that even very recent uses of HTML don't 
> necessarily "get" that it is for URIs (the site I referenced launched 
> last month, as I recall).

Mistakes are common with HTML, sure.


> > While we're at it, Philip had other data:
> >
> > > Also maybe relevant: see http://philip.html5.org/data/cite.txt for 
> > > some older data about <cite>. (Looks like non-title uses are very 
> > > common.)
> >
> > This seems to support my point that <cite> is used for a whole variety 
> > of purposes, like <em>, <i>, <q>, HTML4's <cite>, and HTML5's <cite>. 
> > Very few, actually much fewer than I had remembered from my last look 
> > at the data, are names of people, citations or otherwise.
> 
> I actually took this information the other way, that there are indeed 
> other uses for <cite> out there beyond titles.

I don't think anyone has argued otherwise. I've only argued that of the 
uses that <cite> is put to, the only ones that are common but have no 
other more appropriate elements (i.e. aren't flat out mistakes) are 
citations and titles, and not people's names.


> > On Mon, 27 Jul 2009, Erik Vorhes wrote:
> > >
> > > > A new element wouldn't work in legacy UAs, so it wouldn't be as 
> > > > compelling a solution. Also, <cite> is already being used for this 
> > > > purpose.
> > >
> > > My preference would be for <cite> to retain the flexibility it has 
> > > in pre-HTML5 specifications, which would include referencing titles.
> >
> > The flexibility doesn't seem as useful as limiting it to titles. What 
> > is the problem solved by allowing names to be marked up in the same 
> > manner as titles? The problem solved by allowing titles specifically 
> > to be marked up is that titles are usually typographically offset from 
> > the surrounding text in a distinctive fashion. This doesn't apply to 
> > names. Reusing the same element for both encourages authors to use 
> > <cite> for both which makes it harder for them to get the right 
> > typographic effect, leading to a lower quality of typography overall. 
> > I think this is a bad thing.
> 
> This is not just about names. It allows other (non-title) text to be 
> identified as a citation. If <cite> is identified as "title of work," 
> you can't cite many major orchestral arrangements at all, nor can you 
> cite legal decisions.

Why not? An orchestral arrangement is a work, and has a title -- the spec 
explicitly lists "score", "song", and "opera" as possible works, for 
instance.

I've added "legal case report" to the list, to clarify that you can use 
<cite> to name such reports.


> Unless by "title of work" you mean "standard citation for an item, 
> usually its title"; but then <cite> really means what it is defined as 
> in the HTML 4.01 specification.

Unless you have a very loose definition of "citation", or unless you 
consider a person to be a possible "source", <cite> in HTML5 is a strict 
superset of HTML4's definition.

For example, the following is valid HTML5 but wouldn't be valid HTML4, 
since it's not a citation or reference to another source, but merely 
something mentioned in passing:

   <p>Today, as I was moving my copy of <cite>Dreamer's Void</cite>, I
   hurt my back.</p>


> > > If backwards compatibility is that big a concern, why does HTML5 use 
> > > <legend> outside of <fieldset> elements?
> >
> > There were no existing elements that could be reused for many of the 
> > new semantics. When there were, we used them (e.g. <i>, <b>, <cite>, 
> > <menu>, <legend>, <h1>).
> 
> I agree that there aren't always existing elements for the new semantics 
> included in HTML5, but I don't believe that backwards compatibility is 
> as big a concern as you claim it is.

Ok.


> HTML5's re-use of <legend>, for example, is completely broken in every 
> extant browser.

Yeah, <legend> is a complicated case where a number of factors have 
prevented an ideal solution. (The alternative, introducing yet another 
element that means the same as <legend>/<label>/<caption>/<h1>/<th>/etc, 
is worse, on the long run, than simply waiting a few years to intoduce 
<figure> and <details>.)


> Besides, there's already <tt>, which could be used to identify "title 
> text" or something like that.

It has the wrong default styles.


> > > > What is the pressing need for an element for citations, which 
> > > > would require that we overload <cite> with two uses?
> > >
> > > A title can be a citation, but not all citations are titles. What's 
> > > the pressing need for limiting <cite> only to titles?
> >
> > As described above, the need to have an element for titles is that 
> > there are typographic conventions that apply to titles. What is the 
> > pressing need for an element for citations, which would require that 
> > we overload <cite> with two uses?
> 
> As I have said previously, there aren't consistent typographic 
> conventions that apply to titles.

There are widely used conventions, though, for which <cite> has 
appropriate default styles.


> The "pressing need" is that <cite> is already used to define citations.

<cite> is also used to mark up titles that aren't citations, as shown by 
Philip's data.


> There's no reason to limit it to a subset of citation (more below).

I honestly don't understand how HTML5 is a subset of HTML4 here, unless 
you mean people's names, which as far as I can tell aren't commonly used 
with <cite>, and for which there is no benefit to using <cite>.


> > But why does that have value? How would you use this information?
> 
> To collect citation information. I don't see how that as any less value 
> that collecting titles of works, especially since not all works have 
> titles or means of reference that would constitute a conventional 
> "title."

Virtually nobody either collects citation information _or_ collects titles 
of works. If that is the use case that we have to deal with, then please 
provide evidence that there is actually a significant need for this. So 
far I'm not aware of anyone actually doing this other than Mark Pilgrim, 
and he stopped doing it years ago.

Currently, <cite> in HTML5 isn't for collecting anything, it's purely to 
provide a hook for styling.


> > > >> > Note that HTML5 now has a more detailed way of marking up 
> > > >> > citations, using the Bibtex vocabulary. I think this removes 
> > > >> > the need for using the <cite> element in the manner you 
> > > >> > describe.
> > > >>
> > > >> Since this is supposed to be the case, why shouldn't HTML5 just 
> > > >> ditch <cite> altogether? (Aside from "backward compatibility," 
> > > >> which is beside the point of the question.)
> > > >
> > > > Backwards compatibility (with legacy documents, which uses it to 
> > > > mean "title of work") is the main reason.
> > >
> > > I'd beg to differ, regarding "legacy documents." See, for example 
> > > the automated citation generation at Wikipedia: 
> > > http://en.wikipedia.org/wiki/Wikipedia:Citation_templates
> >
> > What specifically am I looking for here? This doesn't seem to have any 
> > relevance to HTML.
> 
> Wikipedia automatically wraps citations in the <cite> element. View 
> source on any of the Example sections.

Wikipedia's output is not an argument for consuming <cite>. In fact, what 
they're doing is an argument against keeping <cite> for that purpose: they 
are explicitly overriding the only behaviour <cite> gives them (italics) 
and then going out of their way to reintroduce that effect on a <span>! If 
that's not an argument for changing the meaning of <cite> to something 
more convenient, I don't know what is.


> > > In addition, the comments at zeldman.com use <cite> to reference 
> > > authors of comments. While that specific example is younger than 
> > > HTML5, this is merely an example of a relatively common use-case for 
> > > <cite> that does not use it to signify "title of work."
> >
> > As I said, the most common use of <cite> is to mark up italics. I 
> > agree entirely that it's misused.
> 
> I haven't said that it's misused. I apologize that you have 
> misunderstood me. I have repeatedly and consistently contended that 
> <cite> should be used for more than just titles. I believe that 
> Zeldman's use is perfectly appropriate and correctly used.

I disagree. I view it as an example of semantic markup for the sake of it. 
We can be more helpful to authors.


> > Blog commenters don't need to be marked up any differently than the 
> > number of the comment -- that's a stylistic issue that varies from 
> > blog to blog. I don't see the need for an element specifically for 
> > people commenting on blogs. In most blogs that I've seen, the name 
> > isn't even highlighted in any particular fashion.
> 
> Again, this isn't just about citing people or "blog commenters"; this 
> was just an example of a current, non-title, and correct use of <cite> 
> according to current specifications. (And why does it matter if 
> something is particularly highlighted? Is HTML supposed to be a 
> presentational language? Why limit <cite> to the place of a 
> presentational element?)

The only use case I'm aware of for <cite> is as a media-independent 
presentation hook, yes.


> > > Existing tools that treat <cite> exclusively as "title of work" do 
> > > so against every HTML specification out there (i.e., HTML 4.01 and 
> > > earlier).
> >
> > Existing tools generally have had very few problems in finding ways to 
> > do things against every HTML specification out there. Over 90% of all 
> > content on the Web is syntactically invalid in some way, and I'm sure 
> > that more than 10% of content on the Web is generated by tools.
> 
> Yes, and one of those tools is Wikipedia, which wraps entire citations 
> in the <cite> element, not just titles. It correctly follows current 
> HTML specifications in using <cite> to identify a citation.

Upgrading Wikipedia to HTML5's definitions will simplify Wikipedia. This 
seems like a net win.


> > > > Indeed, there is a lot of misuse of the element -- as alternatives 
> > > > for <q>, <i>, <em>, and HTML5's meaning of <cite>, in particular.
> > > >
> > > > Expanding it to cover the meanings of <q>, <i>, and <em> doesn't 
> > > > seem as useful as expanding it just to cover works.
> > >
> > > I believe you mean "limiting it just to cover works" here.
> >
> > I meant expanding it, since not all titles of works are citations.
> 
> Any reference to a title of a work is by definition a citation. 
> Therefore you are limiting <cite> to a subset of citation.

I disagree with your definition of "citation".


> > As a first approximation, titles are italics, and names are not. I 
> > think that's a far closer approximation of typographical conventions 
> > than lumping titles and names together into one default style.
> 
> This doesn't seem to be an issue for you with the reuse of <legend> in 
> another context, even though it is broken. So why is it an issue here? 
> (And again, titles are not always in italics.)

<legend> is an example of the worst possible end result. It's not an 
example of best practice. It's an issue with <legend> also, there are just 
other factors at work there.


> > I haven't changed the spec. I continue to hold the position that 
> > covering titles of works is more useful than covering titles of works 
> > and names of people, and more useful than covering only names of 
> > people or works that are explicitly cited.
> 
> You are misconstruing my argument. This isn't about including names of 
> people; that is just the most obvious non-title form of citation. This 
> is about properly understanding what a citation can be and writing the 
> specification for the <cite> element to account for those possibilities. 
> Citations are references to works, people, etc. By limiting it to "title 
> of work" you are actually limiting it to a subset of a subset, as many 
> objects worth citing don't have conventional titles.

Unless you can demonstrate that there is a concrete benefit to doing what 
you describe, I do not think it is a good idea. There are concrete 
benefits to the definition currently in HTML5, namely it provides a good 
first approximation of common typographic effects at a very low cost.


On Mon, 3 Aug 2009, Jeremy Keith wrote:
> Hixie asked:
> > What is the problem solved by allowing names to be marked up in the 
> > same manner as titles?
> 
> They are both entities being referenced (cited). It seems arbitrary to 
> me to forbid referencing names with the <cite> element. HTML 4 already 
> allows it, authors would have to change their existing behaviour 
> (something to be avoided wherever possible) and when the meanings of 
> other existing elements—<i>, <b>, <small>—are being *expanded*, I can't 
> follow the logic in *restricting* the meaning of an element already 
> being used broadly.

As noted above, I believe that this is an expansion as well (I don't think 
HTML4's use of "source" was meant to include people). But in any case, 
what you describe here isn't a problem.

What is the _problem_ solved by allowing names to be marked up in the same 
manner as titles?


> > The problem solved by allowing titles specifically to be marked up is 
> > that titles are usually typographically offset from the surrounding 
> > text in a distinctive fashion. This doesn't apply to names.
> 
> That's what CSS is for.

CSS is optional. We need the media-independent layer to make sure that we 
get a reasonable rendering even without CSS. (Otherwise, why wouldn't we 
just be using <span> for everything?)


> Okay, but it won't make any difference to authors like myself who will 
> continue to use <cite> to mark up names.
> 
> We can do this either by applying a Kenobian interpretation of the spec 
> (e.g. a person is the work of their parents/peers/society and a person's 
> name is therefore a "title of work")

The spec explicitly says people's names aren't titles of works.


> When it comes to language features, the browser makers don't have to do 
> much—just make sure the element shows up in the DOM. However, if authors 
> refuse to implement a language feature as described in the spec, then 
> the spec becomes fiction.

Agreed; that's why I base a lot of the spec on research about what authors 
are doing. In practice, most authors aren't marking up names with <cite>.


> Authors use the <cite> element to mark up names.

Only a small minority do. Certainly not enough to make this a language 
feature.


> It is often the most semantically appropriate element for marking up a 
> name

There is no need to mark up a name at all.


> (and then in itself is a good enough reason to use it

No, that's a cargo-cult approach to semantic markup.


> I don't think it makes sense to ignore the existing behaviour of 
> authors.

Existing behaviour of authors is not to mark up names with <cite>.


> Authors such as myself will continue to use the <cite> element to mark 
> up names; our markup will still be conforming; validators won't flag up 
> our choices as errors.

Your markup won't be conforming, though you are correct that the validator 
won't catch this error.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the whatwg mailing list