[whatwg] the cite element

Ian Hickson ian at hixie.ch
Mon Oct 5 19:13:12 PDT 2009

On Tue, 22 Sep 2009, Jim Jewett wrote:
> On Tue, Sep 22, 2009 at 8:46 PM, Ian Hickson <ian at hixie.ch> wrote:
> > On Wed, 16 Sep 2009, Erik Vorhes wrote:
> >> On Wed, Sep 16, 2009 at 4:16 AM, Ian Hickson <ian at hixie.ch> wrote:
> >> >> Unless there is some semantic value to the name being more than 
> >> >> "just" a name, yes.
> >> > Is there?
> >> Yes
> > What is it?
> <cite> points to a primary source of the statement, as opposed to an 
> someone merely named by the statement.

I hate to be so repetitive, but why is that beneficial? What is the 
semantic value of this?

Is there as much semantic value in pointing to the primary source of a 
statement as there is in knowing that the word "earth" refers to the 
planet and not the dirt, for example? If so, what is that extra value?

> >> and with the removal of the <dialog> element (of which I was unaware 
> >> when I sent my last message) makes a compelling case for the 
> >> re-expansion of <cite> for dialog.
> > 
> > Why?
> dialogues and transcripts and credits and theatrical scripts are all 
> arguably too fine-grained for a "citation", as opposed to a "label" or 
> "attribution", but they are certainly real use cases where the 
> attribution is important.

Why? This is not a rhetorical question, I'm trying to get to the use case 
that means that there is an actual benefit to what you are asking for. 
Just saying that it's important doesn't say _why_ it is important. I'm not 
denying that it is important, I'm just trying to work out _why_, so that 
the proposal (e.g. to use <cite> for this) can be properly evaluated.

What does <cite> do that you want?

> These three are even cases where print sources will typically shift
> font in some way between the attribution (<b>Mephistopheles</b>) and
> the actual statement, though not always in the same manner.  Of the
> three that I found first,
>   Indented lines, said
>   or sung aloud.
> <i>Name.</i>  Statement begins here.
> Q.    Attorney's question.
> A.    Witness answers.
> Q.    Attorney's next question.
> A.    Next response.

I'm not sure what you're saying here.

> >> On October 31, 2006, Michael Fortin suggested the following pattern:
> >> <p><cite>Me:</cite> <q>Can I say something?</q>
> >> Which Jeremy Keith also recommends. [1]
> ...
> >> Aside from the current definition of <cite>, I think this would be a
> >> good use of the element, since it makes more sense than <b> or <span>
> >> (what do those signify in this context?) and there's nothing wrong with
> >> an italicized name in this context. Moreover, there are examples of
> >> Fortin/Keith's usage in the wild.
> > I don't understand why we need an element here at all, and I don't
> > understand why we would want to reuse <cite>, of all elements, if we did
> > in fact need one.
> That "Me:" isn't pronounced; it is metadata so important that it gets 
> written (in an odd style) in printed form.

I don't buy that at all. It's just one way that people write dialogs, but 
as far as I can tell this is perfectly adequate:

   <p>Me: Can I say something?</p>

...and you need neither <q> nor <cite>. I really feel that you are trying 
too hard to solve a problem that really doesn't exist here.

> The punctuation (followed by a new sentence, complete with initial 
> capitals) is the closest a typewriter can come to markup, and scripts 
> will typically make the difference more emphatic.

If it's _important_, then use <strong>. If it's just a keyword, then <b> 
is fine. If you're saying that the name is something that is in a 
different voice, then either the name or the text could be in <i>.

If you need even more fine-grained styling, <span> with class="" seems 
fine here.

I don't really see the need for more than that though. It's not like there 
is a style so common that a new element would be useful.

> I'll agree that it seems odd to have that many <cite> elements in such 
> close proximity, but it is the closest match I can find in the spec, and 
> it doesn't seem to be actually wrong.  Searching for lines by a 
> particular character is a fairly common use case.

Doesn't "find in page" handle that fine?

> >> > ...  How do you define "citation"? What problem does it solve?
> >>  <cite> should be allowed for markup in the following instances:
> >> - titles of works - full citations - names and other sources of quote 
> >> attribution (including identifying speakers in dialog) - names of 
> >> blog post commenters and authors (in the context of their comments, 
> >> posts, etc.)
> > That seems like a really strange and eclectic variety of uses.
> All boil down to "says who?".  A title of a work indicates something 
> about when they said it, and how (formally enough to have a title), but 
> ... so does a hyperlink to the author.

"title of work" doesn't boil down "says who":

   <p>My favourite book is <cite>Pandora's Star</cite>.</p>

...so if that is bundled with the others, I stand by my statement that 
this is a really strange and eclectic variety of uses.

> > For example, it seems odd to say that in the following, the third <cite>
> > is non-conforming, but the other two are fine:
> >
> >   <article>
> >    <footer>Comment by <cite>John Adams</cite></footer>
> >    <p>I think that the following comment by <cite>Fred Fox</cite> is
> >    right:</p>
> >    <blockquote>
> >     <p>Tomatoes are juicy.</p>
> >    </blockquote>
> >    <p>However, I like to visit <cite>Ian</cite> and he does not like them
> >    at all.</p>
> >   </article>
> Please do some hallway testing on this.  Ask half a dozen people what 
> they think of this markup.  If you have to prompt, ask about the use of 
> cite in particular.
> I'm guessing that most won't even really notice the cites to John
> Adams or Fred Fox, but almost all will wonder about the cite to Ian.
> The difference is that John Adams and Fred Fox were the ones saying
> something -- the cite was attributing something to them.  They were
> "actors" as opposed to "objects" in the linguistic sense.  Ian was
> simply an "object" (a direct object, in this case) that happens to be
> human.

I've started asking people what they think the errors are in the following 

   <h1>Welcome to my home page</h1>
   <p>My name is <cite>Bob Smith</cite>.</p>
   <p>I like the book <cite>Pandora's Star</cite>.</p>
   <p>What do you think?</p>
    <cite>James Smith</cite>
    <p>I'm with you <cite>Bob</cite>!</p>
    <p><cite>James</cite> wrote:</p>
    <blockquote><p>I'm with you <cite>Bob</cite>!</p></blockquote>
    <p>But I disagree, I think <cite>Pat</cite>'s blog post is better.

...but frankly I'm having trouble working out which you are proposing to 
have valid and not, which is not a good sign.

Given that I don't see the use case of marking up any of the <cite>s in 
the above except the book title (which would be styled differently), I 
really don't see the point of having this level of complexity.

> > It seems like it would be better to not have any elements for the 
> > bottom three definitions you list, or to introduce a new element for 
> > those that have use cases. However, no compelling use cases have been 
> > mentioned as far as I am aware.
> Are you seriously saying that there is no need to attribute to "names 
> and other sources of quote attribution (including identifying speakers 
> in dialog)", or to markup the user name of "names of blog post 
> commenters and authors (in the context of their comments, posts, etc.)"

As far as I can tell, there is no need, no. What is the need?

> I haven't yet seen a forum that didn't style usernames of the 
> commentators differently (generally either bold or as a link, rather 
> than italics, but still differently).

Sure, but they also style the number of days that the user has been a 
member, and their signature, and all kinds of other things; are you 
suggesting we introduce elements for all those also?

I don't see why <span>, or <b> and <i>, with the class="" attribute, 
doesn't work for these cases.

I _really_ don't see why we'd want to use <cite> here, given that as you 
say, it doesn't even give the right styling.

> Nor have I yet seen a script (or published play) that didn't use some 
> styling variation to distinguish the character names from their words. 
> (Usually -- but not quite always -- I see additional variations to 
> indicate character actions, and generic stage directions such as scene 
> endings.)

Most actual scripts I've seen have one font with no styling whatsoever, 
just indenting and all-caps in places.

> The original purpose of a citation was so that readers could, if they 
> wished, go back to the original.  That is much easier when the original 
> is only a click away, and so even more important.

That's what <a> is for. No need for <cite> for that purpose.

> >> My own interpretation of (a fraction of) 
> >> http://philip.html5.org/data/cite.txt did not support narrowing the 
> >> definition only to titles.  For example
> >> 
> >> (1)  Examples of citing a person, arguably the creator.
> >> 
> >> (1a)  http://www.hiddenmickeys.org/Movies/MaryPoppins.html
> >> 
> >> The cite element is used to give credit to the person who
> >> found/verified each "Hidden Mickey":
> >>     <CITE>REPORTED: <A HREF="mailto:...">Beverly O'Dell</A> 12 MAR 98</CITE>
> >>     <CITE>UPDATE: Greg Bevier 29 JUL 98</CITE>
> > 
> > I don't think that's a usage anyone is actually arguing for though, is 
> > it?
> Yes, I do think so.  The person in the cite element is the source of the 
> information.  This is similar to using cite for the author of a comment 
> at a blog.

But with the word "REPORTED:" inside it? With the date inside it? Surely 
that isn't what you are requesting. It doesn't match any of the 
definitions you gave earlier, as far as I can tell.

> >> (1b)  http://www.webporter.com -- they give the author of the 
> >> article.  But it looks like they (at least sometimes) include the 
> >> title as well, which fits under full citation.
> > 
> > Right, this is the "full citation" feature. Notice their stylings, 
> > though: they are overriding the default font styles, and instead 
> > treating the whole thing as a block-level element. They would be 
> > better off using <p> with a class, or having us introduce a 
> > block-level element like <credit> or <dc> (which we might add to 
> > <figure>).
> I agree that they would be better off with a <credit> element.  I also 
> believe that <credit> would be better for some of the use cases that 
> seem to be contentious, like blog-comments-author.  (1a, 1c, and 1d 
> would also be better off with <credit>, in my opinion.)  An <attrib> 
> element might be better still, as that would also work sensibly in 
> dialogues.
> But <cite> is clearly the best option unless/until the more specialized 
> <credit> (or attrib) is added.

No, <p> is "clearly" the best option: it has the right styling, and 
doesn't require us to make the definition of an element more complicated.

Why would <cite> be a better option than <p>?

> [snip discussion of possible future changes to <figure>; let's leave 
> those for the future]

> >> (2)  Several uses -- and several *non-uses* for titles from
> >> http://www.growndodo.com/wordplay/oulipo/
> >>
> >> The page begins with carefully attributed blockquotes.  These are
> >> *not* done with cite, presumably because it didn't seem flexible
> >> enough.  Instead, it was marked up as
> >>
> >>     <p class="quote">...
> >>     <p class="citation">
> >>       <span class="citationauthor">François Le Lionnais</span>,
> >>       <span class="citationsource">Lipo: First Manifesto</span></p>
> >>
> >> Within the text, <cite> was used to point to source materials, but
> >> there didn't seem to be anything quoted; in most cases the texts were
> >> used as example objects of study; if they actually need a title
> >> markup, then so does the specific Viking ship in Leif's example.
> >> Sample usage:   <cite>S + 7</cite> (substrata ("novelette" +
> >> 7) does appear to be a title.
> >>
> >> At the end of the page, there is a further readings section.
> >>     <dt>author<cite>title</cite>publisher</dt> is used for printed
> >> reference books
> >> but
> >>     <p class="linklist"><a href ...> is used for equivalent references
> >> on the web,
> >> and cite is also used to name the professor of a course
> >>     <cite>4-5 units, <a
> >> href="http://www.centerforbookculture.org/dalkey/bio_gsorrentino.html">Sorrentino</a></cite>
> > That page seems pretty close to what HTML5 specifies now, though it's not
> > fully consistent, as you say.
> That almost sounds as though the real specification were:
>    "Book Title, even if you aren't quoting or
>     paraphrasing anything -- this isn't really about
>     citations; we just call it cite for historical reasons."

That's exactly what HTML5 says, yes.

> I'm trying to imagine keeping a straight face as I say that books get 
> special markup because their names often need to be italicized, but this 
> doesn't apply to ships, because, well, ships aren't written down.

Ships get <i>. Search for "ship name" in the spec (it's mentioned twice).

> And whether to <cite>The Gettysburg Address</cite> sort of depends on 
> how you want it styled.

Well, you're always allowed to omit the markup -- I mean, you don't _have_ 
to surround the word "WARNING!" with <strong>, for instance, and in 
practice whether you do or not is more or less up to how you want it 
styled. So I don't see why that's a problem.

On Wed, 23 Sep 2009, Jim Jewett wrote:
> Smylers wrote:
> > If authors are spending time on using an element which has no effect 
> > on users (and Hixie's pointed out that in many cases where <cite> is 
> > used other than for titles of works authors use CSS to remove the 
> > default italics, to ensure that users don't actually have the presence 
> > of the <cite> conveyed to them) then there's no reason for HTML5 to 
> > continue to support it.
> If they are merely changing the styling to some other distinctive form, 
> there is still reason to support it.  If they are truly going to the 
> effort of adding it, then working to make it indistinguishable, that 
> tells me the element is *very* important (if perhaps only for 
> bureaucratic reasons), and the problem is with the default styling and 
> UI.

You have an odd use of the word "important". To me, it seems like if 
authors are going out of their way to use an element which has zero effect 
on anything, then they are in fact wasting their time, not doing something 
important. Then again, I also think that "bureaucratic reasons" and "very 
important" are contradictory.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list