[whatwg] Problems with the definition of <cite>

Sun Jan 21 15:06:32 PST 2007

Matthew Paul Thomas wrote:
> Citation is not something new, and there is no 
> obvious reason for styling it differently on the Web.

Citations designed to work within the constraints of expensive print
publishing and to enable manual retrieval from eccentric antiquarians
and musty libraries are clearly not optimized for hypertext. So isn't
there a prima facie case for evolution here?

> Second, as I demonstrated earlier, there is no clear boundary to decide 
> whether you are actually citing a particular person, or just mentioning 
> them.

Yes, this ambiguity exists because the word "cite" has two shades of
meaning. Originally, to "cite" an authority meant to use them as a
witness in a legal case, and hence it came to mean: "To quote (a
passage, book, or author); gen[erally] with implication of adducing as
an authority." But it also came to mean, more vaguely, "To call to mind;
make mention of or reference to". (See OED Online, q.v. "cite", if you
or your organization have a subscription.)

In terms of web functionality, I think HTML needs to provide at least
the ability to:

1) Jump directly to a discussed work/authority (or, at worst, directions
to a discussed work/authority) from a brief mention or detailed
description of said work/authority.

2) Jump directly to the sources of a quotation or statement (or, at
worst, directions to/discussion of the sources of a quotation or
statement) from the quotation or statement, while still allowing the
quotation or statement to contain hyperlinks itself.

3) List works discussed or used as references by a given web document.
(Academics need to be able to track who is citing whom.)

Function 2 and therefore Function 3 clearly require something additional
to <a>.

With advanced URI syntaxes comparable to OpenURL, <a> can perhaps cover
Function 1. But seeing as nothing will force authors to use such URI
syntaxes and as one may not exist that is fit for a given purpose, if we
expect HTML documents to make sense when reserialized to deadtree, one
also may need to include functionality to cope specifically with the
print legacy. For example, web documents tend to give less context for
brief mentions then the print equivalent. Bloggers are notorious for
poor link text like: "but what about <a
href="http://www.example.com">this</a>?" In a print context, "this"
would be replaced by the name of a work, and in a formal context usually
backed up by a bibliography entry if not a note containing a fuller
citation. Such link text also suffers /severely/ from link rot.

> And third, there is no benefit for the reader. It doesn't really make 
> the text any easier to understand; and if the author's name is followed 
> by a title that is also in italics, it may actually be harder to see 
> which is the author and which is the work.

That's true.

> Most likely because it's a transcript. :-) 

Looking at the Oxford Guide's text again, I misread it and you're
entirely right. Sorry for introducing a red herring.

> The genius of HTML is that it gets authors to use many elements that 
> are simultaneously presentational *and* semantic. 

As far as I can tell, that aspect of HTML's genius is pretty theoretical
at present. Right from the start, web designers have been engaged in a
perpetual struggle against the default presentation of most semantic
HTML elements. In the early days, this struggle took the form of using
presentational elements instead (<i>, <b>), using proprietary
presentational features, and misusing semantic elements to achieve
presentational effects, most famously heading elements for bigger text,
<blockquote> for indent, <br> for leading and lists, and <table> for
grid layout. While many of these bad practices are still going strong,
increased knowledge of and browser support for CSS has increasingly
allowed web designers to treat all elements as mere hooks for applying
styles of their choice.  Miscommunication about the reasons to avoid
old-style table layouts has led to the mass replacement of both semantic
and presentational elements with divitis, even to the extent that
newbies regularly attempt to markup tabular data with <div>.
Miscommunication about the reasons to prefer semantic elements to <i>
and <b> has led to <em> and <strong> being misapplied to create italic
and bold effects.

In a page designed today, I'd guess only the following default stylings
are very likely to be preserved when semantic elements are used
correctly:

1. <p> is block.
2. <blockquote> is indented.
3. <h1> to <h6> are block, bold, and of graduated size.
4. <em> and <strong> are inline and respectively italic and bold.
5. <ol> is block and its list items are numbered.
6. <code> is inline and monospace.

Note however that italic, bold, and numbering styles may all need to be
replaced by something different in non-European languages. For example,
in Japanese, <em>'s italic styling should be replaced by CSS2 box
shading:

http://alistapart.textdrive.com/articles/worldgrowssmall

or better yet by special Asian CSS3 properties:

http://www.w3.org/International/questions/qa-css-lang

Hebrew doesn't even /have/ an italic.

That's not a roaring success if the idea was to match semantics with
presentation automatically, although at least heading elements have made
it far easier for screen readers to navigate longer documents.

Now we could perhaps save this idea if:

1) We make user agents' default styles handle internationalization of
HTML elements properly.

2) We modify the idea somewhat and suggest that the genius of HTML when
used with CSS is that its element set is typical of those components
for which a typical page will need to use style hooks. But even this
would be problematic to sustain: where are the <banner>, <navigation>,
<product>, <note>, <comment>, and <advert> elements?

With its (hypothecated) suggested default styles and broader element
set, HTML5 seems to be improving on HTML4 in these respects.

And in defence of HTML generally, I guess many of these problems are the
result of misconceived, buggy, and broken tools, not symptomatic of the
design of the language itself.

> Useful to readers *and* computers.

Until the robots take over, the purpose of markup useful to computers is
that computers can make it useful to readers. For example, citation data
can be extracted to reading lists, citations can be reformatted to suit
the reader's preferences, quotations can be checked against their
sources, and so forth.

--
Benjamin Hawkes-Lewis