[imps] HTML5 and libxml2

Edward Z. Yang edwardzyang at thewritingpot.com
Sat Apr 5 15:48:30 PDT 2008

Hash: SHA1

As an informative note, the tag limitation is with PHP's DOM extension
and not libxml2. I'm probably going to do an implementation similar to
Validator.nu's, although that's given that I'm interested enough in
making major architectural changes to the unmaintained PH5P.

Ian Hickson wrote:
> The normative requirement for such elements is that they are _all_ 
> invalid, even if they just use a-z characters. The range of characters 
> that can be used by elements that aren't allowed is the empty range.

I understand this; however, since HTML has graceful error handling, even
though such elements are invalid we should still have well-defined
handling for them. Which, I suppose, it does. :-)

> Well for example an XML comment cannot contain the string "--".

I took a look at the source code for Validator.nu and all the
differences are there.

> What I meant is make sure that you code handles:
>     <a@> <a#> </a@> X
> ...as creating a DOM tree where the third tag above closes the first one, 
> not the second one. i.e. in your parser and the stack of elements you 
> should keep the original tag names, and only give the munged tag names to 
> the the DOM tree.

Duly noted.

- --
 Edward Z. Yang                        GnuPG: 0x869C48DA
 HTML Purifier <http://htmlpurifier.org> Anti-XSS Filter
 [[ 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA ]]
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the Implementors mailing list