[imps] HTML5 and libxml2
ian at hixie.ch
Fri Apr 4 22:34:24 PDT 2008
On Sat, 5 Apr 2008, Edward Z. Yang wrote:
> HTML5 does not specify any validation mechanism in which to ensure the
> element has the form stipulated by tag name, i.e. [A-Za-z-]+
That (erroneous, as it happens) paragraph is just describing a trend in
the spec's tag names, it's not a conformance criteria of any kind. The
conformance criteria is really just that the elements in the document have
to be the elements defined by the spec.
You may find this post helpful in determining how to read the HTML5 spec:
> Unfortunately, certain tag names causes libxml2 to choke, and HTML5
> doesn't specify any way to:
> 1. Munge the name into something libxml2 finds acceptable
> 2. Ignore the tag as invalid
Indeed, both of these behaviours would be non-conforming.
Can you change libxml2 to support more characters? Is there a real
technical reason for the limitation, or is it just enforcing XML
The characters allowed in tag names are by far not the only area where XML
and HTML differ, so if it is just a matter of libxml2 enforcing XML's
requirements, it will not work well.
> So, in short, due to underlying library limitations I can't put
> arbitrary characters in a tag (which is what Firefox actually seems to
> do), and I don't know exactly what characters I need to get rid of. Advice?
If you can't implement what the spec requires, then make sure to document
the limitations clearly in your documentation. Meanwhile, you can probably
get away with replacing unusable characters with U+FFFD, or at a pinch,
"_", so long as you still use the full tag anems in the parser to
determine which tags are open. However, make sure to document this as
being a conformance problem in your documentation.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the Implementors