[whatwg] Citing multiple <blockquote> elements in HTML5

Tue Dec 2 21:48:34 PST 2008

Benjamin Hawkes-Lewis ha scritto:
> Calogero Alex Baldacchino wrote:
>> [...]
>>
>
> I think you're confusing parsing rules that conforming user agents 
> must follow to associate identifiers with elements (even when ids are 
> duplicated) with the authoring rules that conforming documents must 
> follow (ids must be unique).

Ok, so what's what?

When you read "The value must not contain any space characters.", is it 
an authoring rule for conforming documents, for you? Ok.

When you read "*If the value is not the empty string, user agents must 
associate the element with the given value (exactly, including any space 
characters)* for the purposes of ID matching within the subtree the 
element finds itself (e.g. for selectors in CSS or for the 
|getElementById()| method in the DOM).", is it a parsing rule for 
conforming user agents, for you? Ok. But, isn't it worth to spend a word 
everywhere in the spec to tell when it's a quirck for backward 
compatibility, which might go away in the future, and when it's not, 
because that's not needed? And when it's a drawback from the past, 
shouldn't it be considered in every aspect? After all, wasn't one of the 
main goals of html 5 to turn unwritten and browser-specific rules into 
written and standard behaviours?

I mean, if you allow spacing characters inside an id value, as a parsing 
rule, you can face something like '<div id="foo bar" >', that is an id 
consisting of more than one token. Is it good to leave it in untouched? 
Yes? Ok, but what does it mean for CSS's, since there is a reference to 
them as one reason to allow space characters? That is, can a browser 
handle an id selector starting with the '#' character and being broken 
by a blank space? Or better, is it legal in CSS? Honestly, again, I 
don't remember well, I've never tried something like that (since makes 
no sense at me), and I think that's illegal. But let's say that's 
illegal for conforming style sheets, but existing user agents may or may 
not allow that, each one with its own behaviour. If we "close one eye" 
for '<div id="foo bar" >' in a piece of HTML 5 code, but leave its CSS 
counterpart to a free implementation, we'll solve half of the problem 
(where the problem is turning unwritten rules to written, and possibly 
improved, standards), won't we? But any kind of "CSS quirks" would be 
out of an HTML specification, and I believe '<div id="foo bar" >' is a 
trouble (if instead "foo bar" is not a valid id selector for CSS in any 
browser, that means we're allowing user agents to parse as valid an id 
which is inconsistent with CSS, and so CSS selectors cannot be a reason 
to allow space characters inside an id string - at least, with respect 
to any direct reference to the identifier value). But it might be a 
trouble per se, even only for html conformance by user agents, since an 
URL fragment might contain escaped space characters, but an escaped 
space isn't the same thing as the space character itself, so the rule of 
exact matching, applied to space characters inside an id, may be a 
trouble without extensively considering the '<div id="foo bar" >' case.

Now, let's say, instead, that a user agent, conforming with HTML 5 
specifications, must cut off any token after the first one (I know 
actually "foo bar" is taken as is), that is <div id="foo bar"> becomes 
<div id="foo "> and <div id=" foo "> is valid too. In such a case, 
skipping any spaces too, and stating the same behaviour for strings 
passed to .getElementById() could be nice as a graceful degradation for 
documents non-conforming with the rule "the value [of an id attribute] 
must not contain any space characters", but such might fail with CSS 
selectors such as 'div[id="foo bar"]'.

Perhaps a compromise, if acceptable for backward compatibility, might be:
- when the id value must be compared to a fragment identifier, strip any 
trailing space characters; if the match fails, escape any other space 
characters both in the id value and in the fragid and try again;
- when an attribute is defined to hold an url and its value has spaces 
in its path/query/fragment, escape them before resolving the url (not 
sure if needed);
- for the purpose of ID matching through the DOM 'getElementById' 
method, leave the id value untouched;
- for the purpose of ID matching through CSS selectors accessing it as 
an attribute, leave the id value untouched;
- for the purpose of ID matching through CSS selectors directly 
accessing it (e.g. '#foo') either choose the first sequence of 
non-spacing characters or let the match fail (I can't decide what's 
better, but perhaps the former would fail as well, since I guess anyone 
coding <div id="foo bar"> not only as a fragment identifier, but also 
for styling, might have the nice idea to write "#foo bar { font-weight : 
bold; }" as well).

Anyway, if the id value is also a fragment identifier, which might have 
space characters (since parsing rules prescribe to add such characters 
to the unreserved production), does the (authoring) rule "the value must 
not contain any space characters" make sense?

Now let's come to the duplicated ids issue. Again, what's what? When 
it's said, "The id attribute represents its element's unique identifier. 
*The value must be unique in the subtree within which the element finds 
itself and must contain at least one character.*", I think that's what 
you call an authoring rule. So, I don't think it was so bad to ask for a 
clarification on the subtree nature. And if a subtree happened to match, 
eventually, an element subtree inside a document, was the suggestion for 
a getElementById method on the HTMLElement interface so awful? 
Otherwise, let's consider (again) the second paragraph:

"If the value is not the empty string, user agents must associate the 
element with the given value (exactly, including any space characters) 
*for the purposes of ID matching within the subtree the element finds 
itself (e.g. for selectors in CSS or for the |getElementById()| method 
in the DOM).*"

It's a parsing rule, isn't it? But it tells also the id must be unique 
in the whole document for the purpose of ID matching through the 
getElementById() method in the DOM, because the only object capable to 
get an element by its id is an instance of the Document interface. So, 
any choice should be taken on what to do with duplicated ids. Solving 
the question at the parser level (i.e. defaulting any duplicated id to 
the empty string) would be consistent with both the fragment identifier 
behaviour (only the first occurrence is valid) and the uniqueness rule, 
but might brake some semantics (i.e. an hyperlink used to create an 
instance of a <dfn>, or a <blockquote> with a cite attribute referencing 
a <cite> element, both with a duplicated id not being the first 
occurrence). On the other hand, leaving the duplicated id in the 
document requires some changes in the Document's getElementById() 
method, since the W3C DOM Core does not define a unique behaviour in 
such a case, and I've expressed a few dubts on solving this by adding an 
equivalent method on the HTMLDocument interface; anyway the 
getElementById() behaviour must be defined for such situations, and 
having it to pick the first match may be a solution (but might cause 
side/unwanted effects if misused in actual documents, and leaves no 
chance to access directly to any element with a duplicated id, but if 
I'm not careful when choosing an ID, I can complain just with myself... 
- anyway, the uniqueness fulfillment might become problematic when 
dinamically putting together pieces of code, perhaps from different 
sources, e.g. using XMLHTTPRequests, or because of externally syndicated 
contet, but this is in the scope of careful programming).

 From the point of view of CSS, both choices may be consistent with 
coupled rules such as "#foo { font-size : 13; }" and #foo { font-size : 
14; }", since both would refer to the same element because of cascading 
rules; on the other side, something like 'div[id="foo"] {/*something 
here*/}' or a direct reference to an ID selector as a descendant of 
different elements might perhaps isolate different elements in the 
document (whether to allow such or not is outside html scope - but are 
such cases in the wild?), and for the purpose of compatibility with 
document styled that way, leaving duplicated ids in the document would 
be a better choice. But, in such cases, shouldn't the DOM elements 
selection be consistent with the CSS elements selection (i.e. to avoid 
side-effects when CSS rules manipulate the DOM itself)? That is, if 
through CSS it were possible to reach elements with duplicated ids in 
different subtrees of a document tree (according to the definition of 
all nodes descendant of a non-leaf node as being part of its subtree) 
and to manipulate their content, shouldn't it be possible through the 
DOM too?

Anyway, I'm not so much confused, no more than usual :-P

BR, Alex.

 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

 Sponsor:
 CheBanca! La prima banca che ti dà gli interessi in anticipo.
* Fino al 4,70% sul Conto Deposito, zero spese e interessi subito. Aprilo!
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=7917&d=3-12