[imps] HTML5 and libxml2

Ian Hickson ian at hixie.ch
Sat Apr 5 15:05:00 PDT 2008

On Sat, 5 Apr 2008, Edward Z. Yang wrote:
> Ian Hickson wrote:
> > That (erroneous, as it happens) paragraph is just describing a trend
> > in the spec's tag names, it's not a conformance criteria of any kind.
> Should I submit a patch fixing the error?

I have noted it and will fix it in due course. :-)

> > The conformance criteria is really just that the elements in the 
> > document have to be the elements defined by the spec.
> But the spec also defines behavior when elements are outside of the 
> spec, i.e. an error-condition. I'd appreciate it if the allowed tag 
> names is made a normative requirement for such elements.

The normative requirement for such elements is that they are _all_ 
invalid, even if they just use a-z characters. The range of characters 
that can be used by elements that aren't allowed is the empty range.

> > The characters allowed in tag names are by far not the only area where 
> > XML and HTML differ, so if it is just a matter of libxml2 enforcing 
> > XML's requirements, it will not work well.
> What are these differences explicitly?

Well for example an XML comment cannot contain the string "--".

> > If you can't implement what the spec requires, then make sure to 
> > document the limitations clearly in your documentation. Meanwhile, you 
> > can probably get away with replacing unusable characters with U+FFFD, 
> > or at a pinch, "_", so long as you still use the full tag anems in the 
> > parser to determine which tags are open. However, make sure to 
> > document this as being a conformance problem in your documentation.
> This might be tricky, and it occurs to me that as long as the 
> substitution process works the same for the tags, <a@>t</a@> becomes 
> <a_>t</a_> which is equivalent. I will, of course, document it.

What I meant is make sure that you code handles:

    <a@> <a#> </a@> X

...as creating a DOM tree where the third tag above closes the first one, 
not the second one. i.e. in your parser and the stack of elements you 
should keep the original tag names, and only give the munged tag names to 
the the DOM tree.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the Implementors mailing list