[Imps] True DOM TreeBuilder

Sam Ruby rubys at intertwingly.net
Wed Jan 10 02:36:40 PST 2007


I just committed a minidom.getDOMImplementation() based TreeBuilder to 
html5lib.

Notes:

1) I had to monkey patch minidom in order to get text nodes that are
    immediate children of the document node to work.

2) Based on how html5 is spec'ed, the doctypes become "HTML" instead
    of "html", which is what you would expect in an XML DOM
    representation.

3) This implementation is not namespace aware, nor are the elements
    placed in the XHTML namespace.

Demo:

http://code.google.com/p/html5lib/ is purportedly XHTML 1.0 Strict, but 
is served as text/html and contains such dubious constructs as
"<div id=gaia>".  You can obtained a cleaned up version of this page 
after a side trip through the DOM via:

$ python parse.py -b dom -x http://code.google.com/p/html5lib/

In particular, note what the DOM's default "toxml()" method does to the 
script near the end of this page.

- Sam Ruby



More information about the Implementors mailing list