[Imps] True DOM TreeBuilder
Sam Ruby
rubys at intertwingly.net
Wed Jan 10 02:36:40 PST 2007
I just committed a minidom.getDOMImplementation() based TreeBuilder to
html5lib.
Notes:
1) I had to monkey patch minidom in order to get text nodes that are
immediate children of the document node to work.
2) Based on how html5 is spec'ed, the doctypes become "HTML" instead
of "html", which is what you would expect in an XML DOM
representation.
3) This implementation is not namespace aware, nor are the elements
placed in the XHTML namespace.
Demo:
http://code.google.com/p/html5lib/ is purportedly XHTML 1.0 Strict, but
is served as text/html and contains such dubious constructs as
"<div id=gaia>". You can obtained a cleaned up version of this page
after a side trip through the DOM via:
$ python parse.py -b dom -x http://code.google.com/p/html5lib/
In particular, note what the DOM's default "toxml()" method does to the
script near the end of this page.
- Sam Ruby
More information about the Implementors
mailing list