[whatwg] the cite element

Ian Hickson ian at hixie.ch
Thu Oct 15 17:51:45 PDT 2009


On Mon, 5 Oct 2009, tjeddo wrote:
> 
> I believe that the current HTML5 spec is heading in the right direction 
> by narrowing the meaning of the cite element compared to its ambiguous 
> use in HTML documents in the past. Overloading the meaning of the cite 
> element further by using it to distinguish speaker's names would not 
> only add ambiguity but would require developer's (who want to honor 
> typographical convention) to undo the default italics styling that would 
> be applied to the speaker's name when enclosed by <cite> tags.

Indeed.


> I feel it is an improvement to HTML that the cite element is being 
> focused to specify the "title of a work." It is however unfortunate that 
> the element's name is 'cite' for legacy HTML reasons.

True. The name is pretty close -- it's in fact closer for most people than 
I realised it was, in fact, since people interpret HTML4's definition as a 
superset of HTML5's, and not a subset, as I do -- but as with many things 
in HTML5, it's certainly not ideal.


> I would much prefer the name of the cite element be reserved for a 
> purpose equivalent to the use of \cite{} in LaTeX.
> 
> However, given the ambiguity of the HTML4 specification as to the 
> correct usage of the 'cite' element, I'm wondering if we shouldn't align 
> the 'cite' element with a more intuitive use case matching that of 
> satisfied by \cite{} in LaTeX. And introduce a new inline element called 
> 'tow' (title of work) or 'tor' (title of reference), for example, to 
> explicitly specify the "title of a work."

\cite{} in LaTeX is basically a cross-referencing mechanism: you define a 
bibliographic entry, and then you can generate a cross-reference to it 
using \cite{} in the main text.

This use case is already handled by <a href=""> -- for example, the HTML5 
spec has a bunch of bibliogaphic entries at the bottom, written as:

   <dt id="refsX690">[X690]</dt>
   <dd><cite><a href="http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf">Recommendation
   X.690 &mdash; Information Technology &mdash; ASN.1 Encoding Rules &mdash;
   Specification of Basic Encoding Rules (BER), Canonical Encoding
   Rules (CER), and Distinguished Encoding Rules
   (DER)</a></cite>. International Telecommunication Union, July
   2002.</dd>

...and then to reference them, I just write:

   <a href="#refsX690">[X690]</a>

I think is adequate. It doesn't support more advanced use cases like 
automatic reference generation when copying and pasting, but that can be 
handled by microdata.

Now, if we do in fact conclude that the use case for \cite{} is already 
handled by <a href="">, that leaves us with the choice of what to do for 
titles of works. We could make <cite> obsolete and introduce a new 
element, with the same default styles, and with a better name, like <work> 
or <wtitle> or <tow> or something. However, support for the element would 
take years to be deployed enough to be usable, and in the meantime 
browsers would still have the support of <cite>, so this would not be a 
cheap solution. On the other hand, if we just reuse <cite>, by slightly 
adjusting the definition in HTML4, we end up with a solution more or less 
for free.


> In fact, the two examples given in the HTML4 spec for using <cite> are 
> both incorrect according to the current HTML5 definition:
> 
>      - "As <CITE>Harry S. Truman</CITE> said,"
>      - More information can be found in <CITE>[ISO-0000]</CITE>."

Sure, but then many of the examples in HTML4 aren't even conforming to 
HTML4, so that's not necessarily a big concern. :-)


> By the way, what is the reasoning in the HTML5 spec for stating that 
> ship names should not be marked up with <cite> but should use <i> 
> instead?
> 
> I guess I'm saying, why are ships not considered "works?"

I suppose one could indeed make a somewhat convincing argument that a ship 
is a work. I'm not sure I'd want to try to sell that though. I've removed 
the sentence that says a ship isn't a work, but I haven't added ships to 
the list works.


> Here are three references that indicate specifically that Ship names
> receive the same typographic treatment as other titles of works.

Oh I agree that they do; that's why <i> is suggested. So should emphasis.


On Tue, 6 Oct 2009, Erik Vorhes wrote:
> >
> > Is there as much semantic value in pointing to the primary source of a 
> > statement as there is in knowing that the word "earth" refers to the 
> > planet and not the dirt, for example? If so, what is that extra value?
> 
> Identifying speakers and other sources of attribution have multiple 
> use-cases, as identified previously to this list. Such uses are often 
> extra-contextual, unlike your example of "earth." I don't know how 
> otherwise to respond to such laughably obvious "reductio ad absurdum" 
> arguments.

I'm not aware of any use cases that have been put forward that <cite> 
addresses adequately and that are compelling enough to need solving. Could 
you list them explicitly?


> It may not need to be <cite>, per se, but that is the element that has 
> been used in examples of multiple kinds of quote + attribution markup 
> patterns. And since the WG has a general aversion to creating new 
> elements (except when it doesn't), using <cite> makes the most sense.

We don't need elements at all for quote + attribution markup, as far as I 
can tell.


> To me, recommending <b> or <i> or <span> for such contexts is a 
> nonstarter, as these all appear to be designated for marking up text 
> "without conveying any extra importance." The desire is to have 
> speakers' names and other sources of attribution marked up in such a way 
> that sets them apart from the surrounding context. Especially in the 
> cases of dialog and transcription, their being "special" is important. 
> For example, listen to any of Nina Totenberg's reports on US Supreme 
> Court proceedings, or read just about any printed play text in 
> existence.

Just to get a different presentational effect, <b> is fine.


> Above other sources of attribution, it is important for speakers'
> names to be marked up as distinct from its surrounding context.

Why?


> > I don't buy that at all. It's just one way that people write dialogs, 
> > but as far as I can tell this is perfectly adequate:
> >
> >   <p>Me: Can I say something?</p>
> >
> > ...and you need neither <q> nor <cite>. I really feel that you are 
> > trying too hard to solve a problem that really doesn't exist here.

> Surely you jest.
> 
> Have you ever read a play? In every instance I have run across, speakers 
> and their words are clearly demarcated (not to mention stage directions, 
> etc.

Why are <b>, <i>, <span> not sufficient for this purpose?

(And actually, in all the scripts I've seen, the only formatting has been 
alignment and uppercase vs normal case. I haven't seen bold or italics 
used in scripts at all.)


> > I've started asking people what they think the errors are in the 
> > following snippet:
> >
> >  <article>
> >   <h1>Welcome to my home page</h1>
> >   <p>My name is <cite>Bob Smith</cite>.</p>
> >   <p>I like the book <cite>Pandora's Star</cite>.</p>
> >   <p>What do you think?</p>
> >   <article>
> >    <cite>James Smith</cite>
> >    <p>I'm with you <cite>Bob</cite>!</p>
> >   </article>
> >   <article>
> >    <cite>Fred</cite>
> >    <p><cite>James</cite> wrote:</p>
> >    <blockquote><p>I'm with you <cite>Bob</cite>!</p></blockquote>
> >    <p>But I disagree, I think <cite>Pat</cite>'s blog post is better.
> >   </article>
> >  </article>
> >
> > ...but frankly I'm having trouble working out which you are proposing 
> > to have valid and not, which is not a good sign.
> >
> > Given that I don't see the use case of marking up any of the <cite>s 
> > in the above except the book title (which would be styled 
> > differently), I really don't see the point of having this level of 
> > complexity.
> 
> Your example hardly dignifies a response, but here goes:
> 
> 1. The proposal, as far as I can tell, is to allow <cite> (or some 
> nonexistent element whose name would likely be less logical) to mark up 
> text for attribution, which often would be a name. I don't believe 
> *anyone* is arguing that every name should be marked up with <cite>. Who 
> are you trying to argue against here? You're not arguing against those 
> of us advocating for additional allowable uses for <cite>.
> 
> 2. If you want to play the "reductio ad absurdum" game, I propose we 
> eliminate <article> from the specification, because some stupid content 
> author might try to create a document with the following markup:
> 
> <p><article>The</article> rain in Spain falls mainly on 
> <article>the</article> plain.</p>

The point of the example is that the definition you _are_ proposing isn't 
clear to me. I don't know where you draw the line on what is a citation 
and what isn't.


> >> > It seems like it would be better to not have any elements for the 
> >> > bottom three definitions you list, or to introduce a new element 
> >> > for those that have use cases. However, no compelling use cases 
> >> > have been mentioned as far as I am aware.
> >>
> >> Are you seriously saying that there is no need to attribute to "names 
> >> and other sources of quote attribution (including identifying 
> >> speakers in dialog)", or to markup the user name of "names of blog 
> >> post commenters and authors (in the context of their comments, posts, 
> >> etc.)"
> >
> > As far as I can tell, there is no need, no. What is the need?
> 
> You repeatedly assert that you don't see a need, but I and others on 
> this list have repeatedly demonstrated a need for this. See above and 
> below, for example, or in numerous other mailing-list messages on this 
> topic.

The only need you listed in this e-mail is formatting scripts.


> >> Nor have I yet seen a script (or published play) that didn't use some 
> >> styling variation to distinguish the character names from their 
> >> words. (Usually -- but not quite always -- I see additional 
> >> variations to indicate character actions, and generic stage 
> >> directions such as scene endings.)
> >
> > Most actual scripts I've seen have one font with no styling 
> > whatsoever, just indenting and all-caps in places.
> 
> 1. When did "indenting and all-caps" cease to be "styling"? Someone 
> should notify the CSS3 working group.

We don't need any elements for uppercasing words. We don't need a phrasing 
element for indenting.

Furthermore, attributing lines in a script isn't citation! Indeed, giving 
attributions to fictional characters is even further from what HTML4 
allows than what HTML5 says. If the goal is to mark up scripts, then that 
would be an argument for reintroducing <dialog>, possibly with the 
suggested additional <ds> element. It wouldn't be an argument for <cite>.


> 2. From 2002 until 2007 I was a graduate student in English literature, 
> specializing in textual criticism (that is, the nature of texts and the 
> production of editions, among other things). I routinely taught plays in 
> class and studied plays as part of my own academic pursuits. Most actual 
> scripts that I've seen, including plays and liturgical texts that date 
> back to the early twelfth century, demarcate speakers' names through 
> "styling."
> 
> So I need to call BS on any claim that speakers' names aren't "styled."

What styling do they use? Can you show us some examples?


> >> >> My own interpretation of (a fraction of) 
> >> >> http://philip.html5.org/data/cite.txt did not support narrowing 
> >> >> the definition only to titles.  For example
> >> >>
> >> >> (1)  Examples of citing a person, arguably the creator.
> >> >>
> >> >> (1a)  http://www.hiddenmickeys.org/Movies/MaryPoppins.html
> >> >>
> >> >> The cite element is used to give credit to the person who 
> >> >> found/verified each "Hidden Mickey":     <CITE>REPORTED: <A 
> >> >> HREF="mailto:...">Beverly O'Dell</A> 12 MAR 98</CITE>     
> >> >> <CITE>UPDATE: Greg Bevier 29 JUL 98</CITE>
> >> >
> >> > I don't think that's a usage anyone is actually arguing for though, 
> >> > is it?
> >>
> >> Yes, I do think so.  The person in the cite element is the source of 
> >> the information.  This is similar to using cite for the author of a 
> >> comment at a blog.
> >
> > But with the word "REPORTED:" inside it? With the date inside it? 
> > Surely that isn't what you are requesting. It doesn't match any of the 
> > definitions you gave earlier, as far as I can tell.
> 
> You answered your own question. Why ask the question except to ridicule 
> the person making the suggestion?

The example was put forward as support for what was being proposed. I'm 
just trying to understand the proposal. If the example is not an example 
of what is being proposed, then why was it put forward as an argument in 
favour?


> > Well, you're always allowed to omit the markup -- I mean, you don't 
> > _have_ to surround the word "WARNING!" with <strong>, for instance, 
> > and in practice whether you do or not is more or less up to how you 
> > want it styled. So I don't see why that's a problem.
> 
> When did HTML5 become a set of styling hooks

About 1996.


> for sighted users?

Not for sighted users. For all users.


On Tue, 6 Oct 2009, Gordon P. Hemsley wrote:
> 
> I was discussing the <cite> element with TabAtkins on IRC and I proposed 
> analyzing the actual word 'cite'. Using it as a verb, the definition of 
> 'cite' applies to quotes/quotations, titles, and people, depending on 
> the context.

Defining <cite> to be only applicable for actual citations, and not any 
references to works, is what HTML5 used to say:

   http://www.whatwg.org/specs/web-apps/2005-09-01/#the-cite

We changed it because that's simply not a ueful definition. It also is a 
very odd line to draw:

   <p>I dropped <cite>The New Hacker's Dictionary</cite> on the floor 
   today.</p> <!-- wrong, not the source or reference for a quotation or 
   statement made in the document -->

   <p>When I dropped <cite>The New Hacker's Dictionary</cite> on the floor 
   today, I remembered that "drop on the floor" is a term defined in that 
   very book!</p> <!-- correct, since the book is used as the source of a
   statement in the document -->

Much simpler, more useful, and more or less equivalent except for 
disallowing people's names, is to say it's the title of a work.


> That leaves usages of 'cite' for both titles of works and authors of 
> works. Putting aside the issue of styling for a moment, these two pieces 
> of data both fall under the semantic meaning of 'cite'.

No, not at all. Consider:

   <p>I met Ian today. He was carrying The New Hacker's Dictionary.</p>

This contains no citation, but it contains both the name of an author and 
the title of a work.


> Thus, I propose the following (which TabAtkins generally agrees with):
> 
> Leave the default styling of <cite> to be italicized for legacy 
> implementations and allow any reference to any work or author, with the 
> granularity decided by the individual web developer.

I understand the use of making <cite> apply to titles of works, whether 
cited or not: they get styled as italics, which is the default rendering 
of <cite>.

I don't understand the use of making <cite> apply to names of authors.


> I also propose allowing parenthetical citations and footnote markers (as 
> is used in the various W3C/WHATWG specifications) to also be marked up 
> with <cite>, though I'm not sure if TabAtkins agrees with me on that 
> point.

I don't understand the use of this either.


What authoring problems would these two definitions solve?


On Wed, 7 Oct 2009, Hugh Guiney wrote:
> 
> I like the idea but I'd go for @href instead, e.g.:
> 
> <p>As <cite for="gettysburg"
> href="http://www.whitehouse.gov/about/presidents/abrahamlincoln/">Abraham
> Lincoln</cite> said, <q id="gettsyburg">Four score and seven years ago
> ...</q></p>

Given how rarely people use cite="" on <q> as it is, I don't think we'd be 
solving a real problem by adding this.


> <p><cite href="http://www.imdb.com/title/tt0800080/">The Incredible
> Hulk</cite> (2008) is a reboot of <cite
> href="http://www.imdb.com/title/tt0286716/">Hulk</cite> (2003).</p>

Why isn't <a href=""> combined with <cite> enough for this?


> Dialogs would also benefit from this, as in:
> 
> <cite id="pete">Pete</cite>: <q for="pete">I'm joining a gang.</q>
> <cite id="meredith">Meredith</cite>: <q for="meredith">You can't!</q>
> <cite>Pete</cite>: <q for="pete">Don't try to stop me.</q>

What problem is this solving, which this doesn't?:

  <p>Pete: I'm joining a gang.</p>
  <p>Meredith: You can't!</p>
  <p>Pete: Don't try to stop me.</p>


> Of course the downside to that is being unable to create a relationship 
> between all of a speaker's quotes and attributions without inventing 
> superfluous @ids. For that I propose an "alias" attribute for <cite> 
> which allows it to represent another instance of that same attribution:
> 
> <cite id="pete">Pete</cite>: <q for="pete">I'm joining a gang.</q>
> <cite id="meredith">Meredith</cite>: <q for="meredith">You can't!</q>
> <cite alias="pete">Pete</cite>: <q for="pete">You can't stop me.</q>
> 
> which would also be useful in the informal abbreviation of titles of works:
> 
> <h1><cite id="borat"
> href="http://www.imdb.com/title/tt0443453/">Borat: Cultural Learnings
> of America for Make Benefit Glorious Nation of Kazakhstan</cite>
> (2006)</h1>
> <h2>My Review</h2>
> <p><cite alias="borat">Borat</cite> is a hilarious film about...</p>

I do not believe enough authors would use this to justify it.



On Fri, 9 Oct 2009, tjeddo wrote:
> 
> I've sampled a variety of passages containing real citations to markup 
> in the emerging citation scheme that is being discussed on this mailing 
> list. This way I don't have to overly contrive my examples. My goal here 
> is to illustrate how the cite element can be revised to support first 
> class citation support in HTML5.  Also, all these examples are taken 
> from sources about writing so there is a good chance we will all agree 
> they are valid examples.
> 
> Example 1A [1]:
>     Human beings have been described as "symbol-using
>     animals" (Burke 3).
> 
> Candidate HTML5 Markup:
>     <span id="symbols">Humans have been described as
>     <q>symbol-using animals</q></span>
>     <cite for="symbols" href="#bib-burke">(Burke 3)</cite>.

Why not just this?:

   <p>Human beings have been described as "symbol-using
   animals" <a href="#bib-burke">(Burke 3)</a>.</p>

I don't understand what the for="" is adding that is of any use to anyone.


>     Note: The cite element is used here to make the citation
>     relationship between the paraphrased/quoted content
>     and the original source explicit.

But why? What use will be made of this?


> Example 1B [1]:
>     Human beings have been described by Kenneth Burke as
>     "symbol-using animals" (3).
> 
>     Note: An MLA-valid variant of Example 1A
> 
> Candidate HTML5 Markup:
>     [Option 1]
> 
>     Human beings have been described by
>     <cite for="symbols" href="#bib-burke">Kenneth Burke</cite>
>     as <q id="symbols">symbol-using animals</q> (3).

I don't understand what this adds. Why do you need any markup at all here? 
What problem does this solve?


> Candidate HTML5 Markup:
>     <p>
>     Students having a hard time finding databases isn't a new
>     phenomenon. At the University of Washington, they have
>     problems too.
>     </p>
>     <blockquote id="student-difficulty">
>         <p>
>         With the addition of so many new databases to the
>         campus online system, many students were having
>         difficulty locating the database they needed. At
>         the same time, the role of Session manager had
>         evolved. The increased importance of the Session
>         Manager as a selection tool made it a part of the
>         navigation process itself.
>         <cite for="student-difficulty" href="#bib-Eli">
>         (Eliasen, 1997, p. 510)
>         </cite>
>         </p>
>     </blockquote>

Consider that <blockquote> has a cite="" attribute that nearly nobody 
uses. Why would this even more complicated scheme be used?


> I believe there is a significant value proposition for adding true
> citation support to HTML5, for example:
> 
> * Search engines will have structured citation content to index.
> Algorithms can be developed to better associate content with authors,
> specific quotes with their speakers. This ultimately means more
> relevant searches for the Internet community.

I am very skeptical that search engines would make use of this. It is very 
likely to be used too rarely to give enough of a signal, and it is very 
likely to be highly unreliable even when it _is_ used.


> * If a standardized microdata vocabulary emerges for marking up
> bibliography entries to complement this citation approach, crawlers
> can be udpated to traverse these citation structures and extract out
> specific information more readily.

That would be possible regardless of the element used -- microdata defines 
a data model that is independent of the underlying DOM, so no link can 
really be made from the microdata layer to the <cite> element.


> * Professors might ask their students to write their papers in wiki-like 
> content management systems that encode the citation content in this 
> approach; thereby making it possible to use tools that check for 
> plagiarism.

Why wouldn't the students just lie? Anyway, plagiarism detection is a 
mostly solved problem at this point. There's a whole industry built up 
around it. I doubt we'd affect change in this area with <cite>.


> * Using CSS, authors can readily highlight all their content that
> contains citations to do a double check before publishing.

On the contrary, because <cite>'s styling is wrong for names, they would 
end up doing what Wikipedia does -- removing the styling from <cite>. This 
would have the exact opposite effect, effectively just wasting author's 
time, IMHO.


> * Dialogs can be marked up to make explicit who a statement belongs to. 
> Once again this structure can be exploited by search engines to provide 
> more relevant searches.

I think we should let search engines ask for the features they want, 
rather than assuming that we can solve their problems.


> * Overall we have a chance to standardize how authors encode citations 
> in HTML, which should further encourage Web authors to adopt the 
> encouraged practice of providing support for their claims.

Your optimism is heart-warming, but I am skeptical.


On Thu, 8 Oct 2009, Jim Jewett wrote:
> > 
> > I hate to be so repetitive, but why is that beneficial? What is the 
> > semantic value of this?
> 
> You are welcome to say that argument by authority is so weak as to be 
> invalid, but it still happens.
>
> Similarly, you are welcome to say that the academic habit of crediting 
> other authors (sometimes but not always for specific publications) is 
> silly, but it still happens.

I'm not saying either of those things. (Well, I'm saying the first, but 
that's neither here nor there -- it's not an argument against <cite>).

What I'm saying is that <cite> doesn't help with either of the above. It's 
quite possible to cite people in text/plain, without any markup. What does 
<cite> _add_ that solves a real problem?


> > Is there as much semantic value in pointing to the primary source of a 
> > statement as there is in knowing that the word "earth" refers to the 
> > planet and not the dirt, for example? If so, what is that extra value?
> 
> I recently saw a .sig (where, by who?) with a quotation of one character 
> asking whether another character had said something.  I could link to 
> the archived email by title, but it has nothing to do with .sig.  I 
> could fake up a title, such as "Steven Bethard's .sig". But that can get 
> really awkward when referring to something informal. "The 
> Hiphopopotamus, in something that I couldn't identify even if I saw it, 
> but which I am titling as the original source of the .sig quote".  The 
> .sig itself (if the message weren't in plaintext) could refer to an 
> episode title, but ... that would be a little too pedantic for a .sig 
> quote.
> 
> "<cite>The Hiphopopotamus</cite>" seems a much more reasonable solution.

I do not understand your argument here.


> >> dialogues and transcripts and credits and theatrical scripts are all 
> >> arguably too fine-grained for a "citation", as opposed to a "label" 
> >> or "attribution", but they are certainly real use cases where the 
> >> attribution is important.
> > 
> > Why? This is not a rhetorical question, I'm trying to get to the use 
> > case that means that there is an actual benefit to what you are asking 
> > for.
> 
> They are all cases where "who said it" or "who did it" is important -- 
> sometimes far more important than what they actually said or did.

Sure, but <cite> doesn't help determine that. English helps determine 
that. (Or whatever natural language is used.) What is <cite> doing?


> > What does <cite> do that you want?
> 
> It says who to praise/blame/question for the original thought and/or 
> expression, as opposed to the decision to repeat (and possibly ridicule) 
> it.
> 
> That may not matter much in a technical discussion, but matters in 
> lawsuits and it matters (for different reasons) in academics.

I disagree that <cite> achieves this goal.

A judge isn't going to look at an HTML page and say "oh my yes, the <cite> 
element is clearly specified, therefore the attribution is correct" if the 
page is misleading when printed. In fact for any static page I would be 
pretty surprised if any legal system gave different weight to the HTML 
source than it did to the printed copy of the document, where all markup 
nuance has been lost.



> >> These three are even cases where print sources will typically shift 
> >> font in some way between the attribution (<b>Mephistopheles</b>) and 
> >> the actual statement, though not always in the same manner.  Of the 
> >> three that I found first,
> ...
> > I'm not sure what you're saying here.
> 
> I was pointing out that attribution (to a person by name, not to a work 
> by title) was important enough that print sources distinguished the way 
> they presented the name from the way they presented the content.

Yes, the <b> element in HTML explicitly talks about this, in fact.


> >> >> On October 31, 2006, Michael Fortin suggested the following 
> >> >> pattern: <p><cite>Me:</cite> <q>Can I say something?</q>
> >> ...
> >> >> Aside from the current definition of <cite>, I think this would be 
> >> >> a good use of the element, ...
> >> > 
> >> > I don't understand why we need an element here at all, and I don't 
> >> > understand why we would want to reuse <cite>, of all elements, if 
> >> > we did in fact need one.
> >> 
> >> That "Me:" isn't pronounced; it is metadata so important that it gets 
> >> written (in an odd style) in printed form.
> > 
> > I don't buy that at all. It's just one way that people write dialogs, 
> > but as far as I can tell this is perfectly adequate:
> > 
> >   <p>Me: Can I say something?</p>
> > 
> > ...and you need neither <q> nor <cite>.
> 
> You *never* need q -- you could just use quotation marks.  And you 
> *never* need <li> -- you could just use the entity for a bullet.  But 
> being explicit is often judged worthwhile.

<li> and <q> are both useful because of their default styling, as is 
<cite>, when used for something that works with its default styling. But 
that doesn't apply to people's names -- they are almost never made 
italics.


> >> The punctuation (followed by a new sentence, complete with initial 
> >> capitals) is the closest a typewriter can come to markup, and scripts 
> >> will typically make the difference more emphatic.
> > 
> > If it's _important_, then use <strong>. If it's just a keyword, then 
> > <b> is fine. If you're saying that the name is something that is in a 
> > different voice, then either the name or the text could be in <i>.
> 
> Typically, the name would be entirely silent; in a proper audio 
> rendition, it would be inferred from the change in voice.  Alas, those 
> of us reading (as opposed to hearing) the dialogue need some hints.  A 
> cite (or a hypothetical <attrib>) element is the right semantic hook 
> from which to hang this styling.

I am not at all convinced that this occurs even remotely enough to justify 
an element. As far as I'm aware, _none_ of the examples of current usages 
of <cite> in Philip's study were cases where the name would be hidden from 
audio media representations.


> But if attribution requires hoops like that, then there is really no 
> justification for an element like <cite> that would really just mean <i 
> class="title">

The only justification is that browsers are going to support it anyway. If 
it wasn't for that, I wouldn't have it in HTML5 at all.


> > I don't really see the need for more than that though. It's not like 
> > there is a style so common that a new element would be useful.
>
> It is very common for the distinction to be made obvious through 
> styling.

No, it really isn't. At least, I've never seen data showing that names are 
commonly styled differently, and it certainly hasn't been my experience. 


> >> I'll agree that it seems odd to have that many <cite> elements in 
> >> such close proximity, but it is the closest match I can find in the 
> >> spec, and it doesn't seem to be actually wrong.  Searching for lines 
> >> by a particular character is a fairly common use case.
> > 
> > Doesn't "find in page" handle that fine?
> 
> Not in my opinion.

What are you expecting Web browsers to do that would make <cite> a better 
solution? Has a browser vendor expressed an interest in some UI feature 
for searching by citation name or some such?


> But as long as you're minimizing the markup, that suggestion does bring 
> up another question:
> 
> Should the character names be invisible, because they aren't spoken 
> aloud?  And does this mean they'll need an element (perhaps only span 
> with a specific class) anyhow?  And that this element-class combination 
> should trigger very different behavior depending on the output device 
> and the user's preference?

If the use case is marking up scripts of fictional dialogue (as opposed to 
real quotes with citations), then we should address that using dedicated 
markup such as the <dialog> element that was recently removed. However, as 
shown by that element's removal, there isn't enough demand for such a 
feature at this time to justify adding it at all.


> >   <p>My favourite book is <cite>Pandora's Star</cite>.</p>
> > 
> > ...so if that is bundled with the others, I stand by my statement that 
> > this is a really strange and eclectic variety of uses.
> 
> The only reason to mark it up at all (in this case) would be that the 
> page author is singling it out.  There may not be much detail on *why* 
> it is worthy of special attention, but it is worthy, and the author went 
> to some effort to say so.

No, the reason to mark it up would be to make it italics, as is needed 
for correct typographic effect.

I stand by my statement that the list of use cases you provided was 
eclectic, and that the spec's definition is far simpler.


> > I've started asking people what they think the errors are in the following
> > snippet:
> >
> >  <article>
> >   <h1>Welcome to my home page</h1>
> >   <p>My name is <cite>Bob Smith</cite>.</p>
> 
> Wrong, but probably not harmful in practice.  Sort of like messing up
> a rev=made.
>
> >   <p>I like the book <cite>Pandora's Star</cite>.</p>
> 
> I wouldn't personally use it, unless I were also using the cite
> element to provide ISBN information or some such, but I consider it
> tolerable.
> 
> If I were only using it for styling, I would write either <i>Pandora's
> Star</i> or <i class="title">Pandora's Star</i>, depending on how
> careful I was feeling.
> 
> >   <p>What do you think?</p>
> >   <article>
> >    <cite>James Smith</cite>
> 
> Fine.
> 
> >    <p>I'm with you <cite>Bob</cite>!</p>
> 
> Invalid.  Since you haven't said what Bob suggested, Bob is just a
> name, not an actual source.  If earlier text had explained what the
> idea (with which James Smith agrees) actually was, then that *earlier*
> text could reasonably be wrapped in a cite.
> 
> >   </article>
> >   <article>
> >    <cite>Fred</cite>
> 
> Fine.  Fred is the author/instigator of the next portion.
> 
> >    <p><cite>James</cite> wrote:</p>
> 
> Fine -- Fred is himself crediting (citing) James.
> 
> >    <blockquote><p>I'm with you <cite>Bob</cite>!</p></blockquote>
> 
> Still wrong, for the same reasons.
> 
> >    <p>But I disagree, I think <cite>Pat</cite>'s blog post is better.
> 
> I would change that to <cite>Pat's blog post</cite>, because the
> citation really is to that specific work, and the detailed information
> is available.
> 
> >   </article>
> >  </article>
> 
> > ...but frankly I'm having trouble working out which you are proposing to
> > have valid and not, which is not a good sign.
> 
> It doesn't matter than something is a proper noun; it matters that 
> something is the (linguistic) Agent responsible for whatever is being 
> credited.

I think this is far too subtle for people to understand. People have 
trouble (lots of trouble!) understanding the subtlty of "em is for 
emphasis", the chances of most people understanding "cite is for the 
linguistic agent responsible for something being credited" are basically 
nil, as far as I can tell.

As noted above, this used to be what HTML5 said. We changed it because it 
just isn't simple enough. There also really isn't a need for this, as far 
as I can tell. There is a related need -- styling titles, as you noted 
above -- and so the element is twisted a bit into addressing that need. If 
it wasn't for that need, I think we would have dropped the element 
altogether.



> >> Are you seriously saying that there is no need to attribute to "names 
> >> and other sources of quote attribution (including identifying 
> >> speakers in dialog)", or to markup the user name of "names of blog 
> >> post commenters and authors (in the context of their comments, posts, 
> >> etc.)"
> > 
> > As far as I can tell, there is no need, no. What is the need?
> 
> Because readers often do care who said it, or who said it first.

Readers aren't seeing the markup.


> > I _really_ don't see why we'd want to use <cite> here, given that as 
> > you say, it doesn't even give the right styling.
> 
> Because we don't have an <attrib> or even a <credit> element, and so 
> <cite> is the closest match.

You're still not saying why you want this element. What would <attrib> be 
good for? What UI would it trigger? How would users or authors benefit?


> Defining it as a synonym for <i class="title"> seems wrong in both 
> directions -- both promoting something that shouldn't be an element, 
> *and* preventing sensible use of an appropriately named element.

Why would it be wrong to have an element to style titles?



> >> The original purpose of a citation was so that readers could, if they 
> >> wished, go back to the original.  That is much easier when the 
> >> original is only a click away, and so even more important.
> > 
> > That's what <a> is for. No need for <cite> for that purpose.
> 
> If you want to say it should be <a class="cite"> then I'll mostly agree 
> -- except that the need for credits does sometimes appear even when 
> hyperlinks are not available.

When hyperlinks aren't available, the use case described above -- to go 
back to the original by clicking -- is not applicable.


> >> >> The cite element is used to give credit to the person who
> >> >> found/verified each "Hidden Mickey":
> >> >>     <CITE>REPORTED: <A HREF="mailto:...">Beverly O'Dell</A> 12 MAR 98</CITE>
> >> >>     <CITE>UPDATE: Greg Bevier 29 JUL 98</CITE>
> > 
> > But with the word "REPORTED:" inside it? With the date inside it? 
> > Surely that isn't what you are requesting. It doesn't match any of the 
> > definitions you gave earlier, as far as I can tell.
> 
> I see them as a "full citation" variant.  They not only say who, they 
> also say when and in what manner/with what certainty.

So the same as the Wikipedia example? This is the case where we have seen 
the element being actively harmful -- causing people using an element for 
the sake of it and then removing the styles. That's not a use case, it's 
the opposite.


> >> I agree that they would be better off with a <credit> element.  I 
> >> also believe that <credit> would be better for some of the use cases 
> >> that seem to be contentious, like blog-comments-author.  (1a, 1c, and 
> >> 1d would also be better off with <credit>, in my opinion.)  An 
> >> <attrib> element might be better still, as that would also work 
> >> sensibly in dialogues.
> >>
> >> But <cite> is clearly the best option unless/until the more 
> >> specialized <credit> (or attrib) is added.
> >
> > No, <p> is "clearly" the best option: it has the right styling, and 
> > doesn't require us to make the definition of an element more 
> > complicated.
> > 
> > Why would <cite> be a better option than <p>?
> 
> For the same reason that <aside> or <footer> is better than <div>.  A 
> byline may technically be a paragraph (or a span), but it is a very 
> specialized and odd type of paragraph.

<aside> and <footer> are better than <div> because they map to things that 
people style. We looked at the top ten or so classes that people were 
using to mark up their documents, and "footer" was #1.

I agree that we should add a <credit> (or <dc>) element at some point in 
the future for use with <figure>. I haven't added it yet because we should 
see how <figure> fares first.


> >> That almost sounds as though the real specification were:
> >>
> >>    "Book Title, even if you aren't quoting or
> >>     paraphrasing anything -- this isn't really about
> >>     citations; we just call it cite for historical reasons."
> > 
> > That's exactly what HTML5 says, yes.
> 
> If, for some bizarre reason, it was deemed appropriate for HTML5 to 
> continue saying this, then the element should be deprecated in favor of 
> <i>, the same way that <acronym> is deprecated in favor of <abbr>.

Why?

We removed <acronym> because people couldn't work out the difference 
between an acronym and an abbreviation. <cite> and <i> are much more 
different and clearly separable.


> > You have an odd use of the word "important". To me, it seems like if 
> > authors are going out of their way to use an element which has zero 
> > effect on anything, then they are in fact wasting their time, not 
> > doing something important. Then again, I also think that "bureaucratic 
> > reasons" and "very important" are contradictory.
> 
> "Zero visual effect in most browsers" if very different from "zero 
> effect on anything."
> 
> <cite> -- particularly when restyled to not be visually apparent -- may 
> be one of the few aspects of HTML which is more important to other 
> classes of products.

Like what? Are there examples I could look at? That would be very helpful 
in terms of finding purposes for <cite> other than just italics.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the whatwg mailing list