[whatwg] the cite element
Ian Hickson
ian at hixie.ch
Wed Aug 12 16:21:23 PDT 2009
On Mon, 3 Aug 2009, Erik Vorhes wrote:
> On Mon, Aug 3, 2009 at 6:29 AM, Ian Hickson <ian at hixie.ch> wrote:
> > Not all titles are citations, actually. For example, I've heard of the
> > /Pirates of Penzance/, but I'm not citing it, just mentioning it in
> > passing.
>
> No, that actually is a citation, whether you realize it or not. You are
> making reference to a musical and are therefore citing it, even in
> passing.
Your definition of "citation" is far looser than my dictionary's ("a
quotation from or reference to"). In fact your definition seems to be
basically the same as HTML5's -- a title of a work. Unless you think that
this should be valid use of <cite>:
<p>I picked up <cite>my favourite book</cite>, and put it next to
<cite>the painting I got from my aunt</cite>.</p>
I don't think that those references to works should use <cite>. Doing so
has zero benefit, as far as I can tell.
> > > See <http://www.four24.com/>; note near the top of the source:
> > > <blockquote id="verse" cite="John 4:24">...
> >
> > My statement stands, on the aggregate:
> >
> > On Mon, 27 Jul 2009, Philip Taylor wrote:
> > >
> > > See http://philip.html5.org/data/cite-attribute-values.txt for some
> > > data. (Looks like non-URI values are quite rare.)
>
> I agree that @cite is rarely used as anything other than a URI; I was
> attempting to demonstrate that even very recent uses of HTML don't
> necessarily "get" that it is for URIs (the site I referenced launched
> last month, as I recall).
Mistakes are common with HTML, sure.
> > While we're at it, Philip had other data:
> >
> > > Also maybe relevant: see http://philip.html5.org/data/cite.txt for
> > > some older data about <cite>. (Looks like non-title uses are very
> > > common.)
> >
> > This seems to support my point that <cite> is used for a whole variety
> > of purposes, like <em>, <i>, <q>, HTML4's <cite>, and HTML5's <cite>.
> > Very few, actually much fewer than I had remembered from my last look
> > at the data, are names of people, citations or otherwise.
>
> I actually took this information the other way, that there are indeed
> other uses for <cite> out there beyond titles.
I don't think anyone has argued otherwise. I've only argued that of the
uses that <cite> is put to, the only ones that are common but have no
other more appropriate elements (i.e. aren't flat out mistakes) are
citations and titles, and not people's names.
> > On Mon, 27 Jul 2009, Erik Vorhes wrote:
> > >
> > > > A new element wouldn't work in legacy UAs, so it wouldn't be as
> > > > compelling a solution. Also, <cite> is already being used for this
> > > > purpose.
> > >
> > > My preference would be for <cite> to retain the flexibility it has
> > > in pre-HTML5 specifications, which would include referencing titles.
> >
> > The flexibility doesn't seem as useful as limiting it to titles. What
> > is the problem solved by allowing names to be marked up in the same
> > manner as titles? The problem solved by allowing titles specifically
> > to be marked up is that titles are usually typographically offset from
> > the surrounding text in a distinctive fashion. This doesn't apply to
> > names. Reusing the same element for both encourages authors to use
> > <cite> for both which makes it harder for them to get the right
> > typographic effect, leading to a lower quality of typography overall.
> > I think this is a bad thing.
>
> This is not just about names. It allows other (non-title) text to be
> identified as a citation. If <cite> is identified as "title of work,"
> you can't cite many major orchestral arrangements at all, nor can you
> cite legal decisions.
Why not? An orchestral arrangement is a work, and has a title -- the spec
explicitly lists "score", "song", and "opera" as possible works, for
instance.
I've added "legal case report" to the list, to clarify that you can use
<cite> to name such reports.
> Unless by "title of work" you mean "standard citation for an item,
> usually its title"; but then <cite> really means what it is defined as
> in the HTML 4.01 specification.
Unless you have a very loose definition of "citation", or unless you
consider a person to be a possible "source", <cite> in HTML5 is a strict
superset of HTML4's definition.
For example, the following is valid HTML5 but wouldn't be valid HTML4,
since it's not a citation or reference to another source, but merely
something mentioned in passing:
<p>Today, as I was moving my copy of <cite>Dreamer's Void</cite>, I
hurt my back.</p>
> > > If backwards compatibility is that big a concern, why does HTML5 use
> > > <legend> outside of <fieldset> elements?
> >
> > There were no existing elements that could be reused for many of the
> > new semantics. When there were, we used them (e.g. <i>, <b>, <cite>,
> > <menu>, <legend>, <h1>).
>
> I agree that there aren't always existing elements for the new semantics
> included in HTML5, but I don't believe that backwards compatibility is
> as big a concern as you claim it is.
Ok.
> HTML5's re-use of <legend>, for example, is completely broken in every
> extant browser.
Yeah, <legend> is a complicated case where a number of factors have
prevented an ideal solution. (The alternative, introducing yet another
element that means the same as <legend>/<label>/<caption>/<h1>/<th>/etc,
is worse, on the long run, than simply waiting a few years to intoduce
<figure> and <details>.)
> Besides, there's already <tt>, which could be used to identify "title
> text" or something like that.
It has the wrong default styles.
> > > > What is the pressing need for an element for citations, which
> > > > would require that we overload <cite> with two uses?
> > >
> > > A title can be a citation, but not all citations are titles. What's
> > > the pressing need for limiting <cite> only to titles?
> >
> > As described above, the need to have an element for titles is that
> > there are typographic conventions that apply to titles. What is the
> > pressing need for an element for citations, which would require that
> > we overload <cite> with two uses?
>
> As I have said previously, there aren't consistent typographic
> conventions that apply to titles.
There are widely used conventions, though, for which <cite> has
appropriate default styles.
> The "pressing need" is that <cite> is already used to define citations.
<cite> is also used to mark up titles that aren't citations, as shown by
Philip's data.
> There's no reason to limit it to a subset of citation (more below).
I honestly don't understand how HTML5 is a subset of HTML4 here, unless
you mean people's names, which as far as I can tell aren't commonly used
with <cite>, and for which there is no benefit to using <cite>.
> > But why does that have value? How would you use this information?
>
> To collect citation information. I don't see how that as any less value
> that collecting titles of works, especially since not all works have
> titles or means of reference that would constitute a conventional
> "title."
Virtually nobody either collects citation information _or_ collects titles
of works. If that is the use case that we have to deal with, then please
provide evidence that there is actually a significant need for this. So
far I'm not aware of anyone actually doing this other than Mark Pilgrim,
and he stopped doing it years ago.
Currently, <cite> in HTML5 isn't for collecting anything, it's purely to
provide a hook for styling.
> > > >> > Note that HTML5 now has a more detailed way of marking up
> > > >> > citations, using the Bibtex vocabulary. I think this removes
> > > >> > the need for using the <cite> element in the manner you
> > > >> > describe.
> > > >>
> > > >> Since this is supposed to be the case, why shouldn't HTML5 just
> > > >> ditch <cite> altogether? (Aside from "backward compatibility,"
> > > >> which is beside the point of the question.)
> > > >
> > > > Backwards compatibility (with legacy documents, which uses it to
> > > > mean "title of work") is the main reason.
> > >
> > > I'd beg to differ, regarding "legacy documents." See, for example
> > > the automated citation generation at Wikipedia:
> > > http://en.wikipedia.org/wiki/Wikipedia:Citation_templates
> >
> > What specifically am I looking for here? This doesn't seem to have any
> > relevance to HTML.
>
> Wikipedia automatically wraps citations in the <cite> element. View
> source on any of the Example sections.
Wikipedia's output is not an argument for consuming <cite>. In fact, what
they're doing is an argument against keeping <cite> for that purpose: they
are explicitly overriding the only behaviour <cite> gives them (italics)
and then going out of their way to reintroduce that effect on a <span>! If
that's not an argument for changing the meaning of <cite> to something
more convenient, I don't know what is.
> > > In addition, the comments at zeldman.com use <cite> to reference
> > > authors of comments. While that specific example is younger than
> > > HTML5, this is merely an example of a relatively common use-case for
> > > <cite> that does not use it to signify "title of work."
> >
> > As I said, the most common use of <cite> is to mark up italics. I
> > agree entirely that it's misused.
>
> I haven't said that it's misused. I apologize that you have
> misunderstood me. I have repeatedly and consistently contended that
> <cite> should be used for more than just titles. I believe that
> Zeldman's use is perfectly appropriate and correctly used.
I disagree. I view it as an example of semantic markup for the sake of it.
We can be more helpful to authors.
> > Blog commenters don't need to be marked up any differently than the
> > number of the comment -- that's a stylistic issue that varies from
> > blog to blog. I don't see the need for an element specifically for
> > people commenting on blogs. In most blogs that I've seen, the name
> > isn't even highlighted in any particular fashion.
>
> Again, this isn't just about citing people or "blog commenters"; this
> was just an example of a current, non-title, and correct use of <cite>
> according to current specifications. (And why does it matter if
> something is particularly highlighted? Is HTML supposed to be a
> presentational language? Why limit <cite> to the place of a
> presentational element?)
The only use case I'm aware of for <cite> is as a media-independent
presentation hook, yes.
> > > Existing tools that treat <cite> exclusively as "title of work" do
> > > so against every HTML specification out there (i.e., HTML 4.01 and
> > > earlier).
> >
> > Existing tools generally have had very few problems in finding ways to
> > do things against every HTML specification out there. Over 90% of all
> > content on the Web is syntactically invalid in some way, and I'm sure
> > that more than 10% of content on the Web is generated by tools.
>
> Yes, and one of those tools is Wikipedia, which wraps entire citations
> in the <cite> element, not just titles. It correctly follows current
> HTML specifications in using <cite> to identify a citation.
Upgrading Wikipedia to HTML5's definitions will simplify Wikipedia. This
seems like a net win.
> > > > Indeed, there is a lot of misuse of the element -- as alternatives
> > > > for <q>, <i>, <em>, and HTML5's meaning of <cite>, in particular.
> > > >
> > > > Expanding it to cover the meanings of <q>, <i>, and <em> doesn't
> > > > seem as useful as expanding it just to cover works.
> > >
> > > I believe you mean "limiting it just to cover works" here.
> >
> > I meant expanding it, since not all titles of works are citations.
>
> Any reference to a title of a work is by definition a citation.
> Therefore you are limiting <cite> to a subset of citation.
I disagree with your definition of "citation".
> > As a first approximation, titles are italics, and names are not. I
> > think that's a far closer approximation of typographical conventions
> > than lumping titles and names together into one default style.
>
> This doesn't seem to be an issue for you with the reuse of <legend> in
> another context, even though it is broken. So why is it an issue here?
> (And again, titles are not always in italics.)
<legend> is an example of the worst possible end result. It's not an
example of best practice. It's an issue with <legend> also, there are just
other factors at work there.
> > I haven't changed the spec. I continue to hold the position that
> > covering titles of works is more useful than covering titles of works
> > and names of people, and more useful than covering only names of
> > people or works that are explicitly cited.
>
> You are misconstruing my argument. This isn't about including names of
> people; that is just the most obvious non-title form of citation. This
> is about properly understanding what a citation can be and writing the
> specification for the <cite> element to account for those possibilities.
> Citations are references to works, people, etc. By limiting it to "title
> of work" you are actually limiting it to a subset of a subset, as many
> objects worth citing don't have conventional titles.
Unless you can demonstrate that there is a concrete benefit to doing what
you describe, I do not think it is a good idea. There are concrete
benefits to the definition currently in HTML5, namely it provides a good
first approximation of common typographic effects at a very low cost.
On Mon, 3 Aug 2009, Jeremy Keith wrote:
> Hixie asked:
> > What is the problem solved by allowing names to be marked up in the
> > same manner as titles?
>
> They are both entities being referenced (cited). It seems arbitrary to
> me to forbid referencing names with the <cite> element. HTML 4 already
> allows it, authors would have to change their existing behaviour
> (something to be avoided wherever possible) and when the meanings of
> other existing elements<i>, <b>, <small>are being *expanded*, I can't
> follow the logic in *restricting* the meaning of an element already
> being used broadly.
As noted above, I believe that this is an expansion as well (I don't think
HTML4's use of "source" was meant to include people). But in any case,
what you describe here isn't a problem.
What is the _problem_ solved by allowing names to be marked up in the same
manner as titles?
> > The problem solved by allowing titles specifically to be marked up is
> > that titles are usually typographically offset from the surrounding
> > text in a distinctive fashion. This doesn't apply to names.
>
> That's what CSS is for.
CSS is optional. We need the media-independent layer to make sure that we
get a reasonable rendering even without CSS. (Otherwise, why wouldn't we
just be using <span> for everything?)
> Okay, but it won't make any difference to authors like myself who will
> continue to use <cite> to mark up names.
>
> We can do this either by applying a Kenobian interpretation of the spec
> (e.g. a person is the work of their parents/peers/society and a person's
> name is therefore a "title of work")
The spec explicitly says people's names aren't titles of works.
> When it comes to language features, the browser makers don't have to do
> muchjust make sure the element shows up in the DOM. However, if authors
> refuse to implement a language feature as described in the spec, then
> the spec becomes fiction.
Agreed; that's why I base a lot of the spec on research about what authors
are doing. In practice, most authors aren't marking up names with <cite>.
> Authors use the <cite> element to mark up names.
Only a small minority do. Certainly not enough to make this a language
feature.
> It is often the most semantically appropriate element for marking up a
> name
There is no need to mark up a name at all.
> (and then in itself is a good enough reason to use it
No, that's a cargo-cult approach to semantic markup.
> I don't think it makes sense to ignore the existing behaviour of
> authors.
Existing behaviour of authors is not to mark up names with <cite>.
> Authors such as myself will continue to use the <cite> element to mark
> up names; our markup will still be conforming; validators won't flag up
> our choices as errors.
Your markup won't be conforming, though you are correct that the validator
won't catch this error.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list