[whatwg] Proposal: automatic cross-reference attribute: xref=""

Sun Mar 25 15:13:53 PDT 2007

The current draft of HTML5 has an automatic cross-reference feature with  
the span, abbr, code, var, samp, and i elements, which would point to a  
matching <dfn> element.

    http://www.whatwg.org/specs/web-apps/current-work/#the-dfn

The current design has a number of problems, as discussed in #html-wg:

[22:46] <mjs> I think there should be a specific element for a  
cross-reference
[22:46] <mjs> the current design is easy to process staticaly but bad for  
dynamic updates, I think
[...]
[22:49] <mjs> to deal with dynamic updates, you need to have a hashtable  
of all dfn defined terms, and the text contents of all span, abbr, code,  
var, samp and i attributes (excluding ones that currently have an  
interactive element or dfn as an ancestor or descendant, etc etc
[22:49] <mjs> I mean, the rule defining what elements it applies to is a  
6-line sentence
[22:50] <mjs> the spec really needs to be fixed not to have such complex  
sentences
[...]
[22:51] <mjs> anyway, it would be much simpler if there was a <term>  
element
[22:51] <anne> there actually used to be <x>
[22:51] <mjs> then the rule can be much simpler, and you don't need to  
keep a big hashtable on the side for documents that use <i> just to  
italicize
[22:51] <anne> for cross references
[22:51] <mjs> x for cross-reference?
[22:51] <anne> but then it was dropped
[22:52] <mjs> I don't like one-letter tag names so much
[22:52] <mjs> HTML has enough of those
[...]
[22:54] <anne> Although I suppose implementing it is complex introducing  
<term> makes authoring a lot more involved.
[22:54] <anne> Instead of typing <code>foobar</code> you have to type  
<term><code>foobar</code></term>
[22:54] <anne> that's an additional [13] characters each time you use the  
term
[22:55] <anne> (otoh, it saves you at least six characters if you don't  
want the cross referencing to happen)
[22:55] <anne> (but that's rare)
[22:55] <mjs> <x> would be better that way I guess
[...]
[22:57] <mjs> the problem w/ the current design isn't mainly that  
implementing is complex
[...]
[22:58] <mjs> it's more that you have a choice of either: (a) regenerating  
the cross-references on any dynamic update will be very slow or (b) you  
have to waste a lot of memory in documents that don't use cross-references
[22:58] <mjs> (to track the state needed in case you dynamically update)
[22:59] <mjs> those seem like a big cost for a feature with a pretty  
specialized use case
[22:59] <anne> well, you only need to start using memory when you  
encounter both a <dfn> and one of the others
[22:59] <anne> (or after they're both inserted)
[23:00] <mjs> but that means when you do encounter a <dfn> you now have to  
walk the whole document
[23:00] <anne> if you also encountered one of the others, yes
[23:01] <mjs> but maybe you have one of those rare documents that does not  
contain any <span> or <i> elements
[23:01] <mjs> those are the main problematic ones
[23:01] <mjs> the others are rare enough in normal documents that always  
hashing their contents is reasonable
[23:01] <mjs> but you still have the problem of how complex the rule is,  
presumably to avoid accidental cross-references
[...]
[23:03] <anne> the rule is complex due to potential nesting issues
[23:04] <mjs> with a specialized element you would not need to worry about  
that, since you could make nesting of that element non-conformant for  
documents, and then let both cross-references happen
[23:05] <anne> <dfn><x>test</x></dfn>
[23:06] <zcorpan> an attribute? <code xref>foo</code>
[23:07] <zcorpan> that's clearer than <code title>foo</code> when you  
don't want it to be a xref, imho
[23:07] <mjs> anne: I see no deep problem w/ making that a cross-reference  
to itself, but you could also make <x> in <dfn> non-conforming
[23:08] <anne> well, non-conforming doesn't help UAs
[23:08] <zcorpan> perhaps i should propose xref="" to the list (or is  
there a better name?), i like it
[23:08] <mjs> anne: well, then you don't need to worry about the weirdness  
of it being an xref to itself
[23:09] <mjs> anyway, <a id="foo" href="#foo">foo</a> is legal
[23:09] <anne> yes, I'm just saying that you have to define what the UA  
has to do
[23:09] <anne> the current draft does that
[23:09] <mjs> zcorpan: global attributes suck, though in this case it  
makes some sense
[23:09] <mjs> anne: yeah, it defines it with a very complex rule
[23:09] <zcorpan> mjs: i didn't say it should be global
[23:09] <mjs> anne: I'd rather have a design that has a simple rule
[23:09] <anne> it also avoids <a>test <code>test</code></a> for instance  
(although arguably <x> can be nested there)
[23:10]  -> Lachy has joined html-wg
[23:10]  anne sort of likes the boolean attribute idea
[23:10] <zcorpan> mjs: it would only do anything on the elements that are  
currently xref elements
[23:10] <mjs> <x> in <a> and vice versa could be disallowed, just as <a>  
in <a> is
[23:10] <mjs> zcorpan: that's not a bad idea
[23:10] <mjs> zcorpan: although it still leaves the complicated rules  
about interactive elements
[23:10] <anne> mjs, you're talking authoring and I'm talking UA criteria
[23:11] <anne> <a> in <a> has to be handled by the UA, for instance
[23:11] <zcorpan> mjs: how so? can't we just say that it's always a  
"link", regardless of where you put it? just like <a> inside <a>?
[23:12] <mjs> anne: I'm saying if something is non-conformant, then the UA  
handling can be something that gives a weird result
[23:12] <anne> yeah, with the attribute you could make stuff less complex
[23:12] <mjs> anne: just like <a> in <a>
[...]
[23:13] <anne> mjs, sure, as long as it's the same accross UAs
[...]
[23:12] <mjs> anne: an example of a very simple rule would be to totally  
ignore nesting
[23:12] <mjs> I think that's a fine rule if you explicitly ask for an  
xref, whether it's with an xref attribute or an <x> element
[23:12] <anne> yeah
[23:14]  anne is ok with that too
[23:15] <mjs> the reason for the weirdness about nesting is really only  
needed because cross-ref semantics are overloaded onto elements that you  
may well be using for a totally different purpose

So. The proposal here is to replace the current xref design with a boolean  
attribute "xref" on the span, abbr, code, var, samp, and i elements, that,  
when present, makes the element a cross-reference to a <dfn> element with  
a matching term. (What is a term is already defined in the spec, and is  
fine.)

As noted in the discussion above, it would have the following advantages:

  * Simpler to implement in a way that performs well (especially in the  
case of dynamic updates).
  * Doesn't have the nesting case problem (just like <a> doesn't).
  * Potentially simpler to author, given that you don't need to actively  
disable xrefs on elements you *don't* want to be xrefs. (The current spec  
source has title=""s all over the place just to disable xrefs.)

To illustrate the difference in markup, here are two extracts of the spec  
source:

    <li>It is an <code>a</code>, <code>applet</code>,
    <code>area</code>, <code>form</code>, <code>img</code>, or
    <code>object</code> element with a <code
    title="attr-name">name</code> attribute equal to <var
    title="">key</var>, or,</li>

...

       <p>Let <var title="">x<sub title=""><var
       title="">i</var></sub></var> be the <span>(2<var
       title="">i</var>)</span>th entry in <var title="">coords</var>,
       and <var title="">y<sub title=""><var
       title="">i</var></sub></var> be the <span>(2<var
       title="">i</var>+1)</span>th entry in <var title="">coords</var>
       (the first entry in <var title="">coords</var> being the one
       with index 0).</p>

With this proposal, they would look like:

    <li>It is an <code xref>a</code>, <code xref>applet</code>, <code
    xref>area</code>, <code xref>form</code>, <code xref>img</code>, or
    <code xref>object</code> element with a <code xref
    title="attr-name">name</code> attribute equal to <var>key</var>,
    or,</li>

...

       <p>Let <var>x<sub><var>i</var></sub></var> be the
       <span>(2<var>i</var>)</span>th entry in <var>coords</var>,
       and <var>y<sub><var>i</var></sub></var> be the
       <span>(2<var>i</var>+1)</span>th entry in <var>coords</var>
       (the first entry in <var>coords</var> being the one with index
       0).</p>

That is, the first becomes a bit longer, but the second becomes shorter.  
IMHO it is easier to understand what the markup does with this proposal.

-- 
Simon Pieters