[whatwg] id and xml:id

Henri Sivonen hsivonen at iki.fi
Mon Apr 3 08:37:33 PDT 2006

On Apr 3, 2006, at 00:00, fantasai wrote:

> Henri Sivonen wrote:
>> On Apr 2, 2006, at 18:56, fantasai wrote:
>>> I'd rather see the id attribute restricted to an NCName token  
>>> insofar
>>> as possible. We can make an exception for Hixie's repetition   
>>> templates,
>>> but otherwise I think it should be compatible with the XML ID  
>>> syntax.
>> Do you mean common attrs should have a co-occurrence constraint  
>> that  changes the datatype of the id attribute if the repeat  
>> attribute is  present?
> Yes. Or, at the very least, if the repetition module is loaded.

Changing id in some cases to an attribute that does not have the ID  
nature would be problematic, but see below.

>> I wasn't even expecting to be able to do IDREF integrity checks  
>> in  RELAX NG. I was planning on doing it in Schematron or Java.  
>> Besides,  general IDREF integrity checking does not check that,  
>> for example,  the form attribute references only form elements and  
>> not just any ids.
> I would want that in the RelaxNG schema because there are editing  
> tools
> that hook into RelaxNG, but not many (or any besides validators)  
> that can
> hook into Schematron (Glazou, for example, is working on a RelaxNG- 
> driven
> editor.)

I agree that editor-friendliness is a worthy goal. I have been  
keeping it in mind, even though I have not actually been testing  
schemas in any RELAX NG-aware editor.

Schematron is not amenable to editor autocompletion features, but in  
*theory* it could be used for discovering errors by running the  
validation function over the document being edited from time to time.

> RelaxNG /can/ do IDREF integrity checks.

It turns out that the ID nature in RELAX NG DTD Compatibility does  
*not* require the ID value to be an NCName. That is a further  
restriction imposed by the http://relaxng.org/ns/compatibility/ 
datatypes/1.0 and http://www.w3.org/2001/XMLSchema-datatypes datatype  
libraries. The ID nature itself only requires that that the ID value  
does not contain whitespace.

I spent quite a while today verifying (by implementing a more  
permissive ID datatype library) that James Clark's Jing agrees with  
my reading of the spec. It does, which is good evidence that my  
reading of the spec is correct. :-)

I don't know what kind of datatype library support Etna has or will  
have, but theoretically, it could even allow using Jing/MSV- 
compatible libraries via JNI. (That could actually be a worthwhile  
feature considering that Java API for datatype libraries is probably  
the most popular one.)

There is a problem, however. One of the main features of RELAX NG is  
that it allows ambiguous grammars: It is OK for a document to be  
valid according to multiple derivations. RELAX NG DTD Compatibility  
restricts grammar ambiguity, because the IDness of an attribute can't  
remain ambiguous. It appear that enabling ID/IDREF checking wreaks  
havoc with schemas that have not been written with this in mind.

I have not yet assessed the extent of the damage, but it could turn  
out that ID/IDREF checking needs to go in a separate schema like  
exclusions. (Does Etna support multiple schemas at a time effectively  
ANDing them?)

> The part about form
> attributes referencing only form elements can be checked by  
> Schematron.


> From an authoring standpoint, the *most* useful part of IDREF  
> integrity
> checking is to check against typos, not against misinterpretation  
> of the
> idref attribute's intent. :)


Henri Sivonen
hsivonen at iki.fi

More information about the whatwg mailing list