[imps] Emulating the HTML DOM when actually parsed from XML

Henri Sivonen hsivonen at iki.fi
Fri Apr 4 03:53:29 PDT 2008


I was thinking that especially with the MathML and SVG additions, it  
would be great to be able to test effect of the HTML5 parsing  
algorithm in current browsers. Since hooking up a parser written in  
Java or Python into a browser written in C++ is in itself non-trivial,  
I started considering an HTTP proxy that intercepted text/html and  
converted into application/xhtml+xml. (Jetty, Validator.nu parser,  
Commons HttpClient.)

This approach might even work for static pages, as many people already  
write their selectors in lower case.

However, a bit part of Web compat is script compat, and the proxy  
would make browsers put the DOM in the XML mode. Would it be possible  
to monkeypatch the features listed at
http://wiki.whatwg.org/wiki/HtmlVsXhtml#Scripts
using JS prototypes if the proxy injected a script into each document?  
Except for document.write(), of course. Might someone already have  
done this?

Then there's the form pointer issue. With Opera, setting the WF2 form  
attribute would work, but what about Gecko and WebKit? And then  
there's the issue that some behavior depends on the character encoding  
and might break if the document is promoted to UTF-8. Would setting  
accept-charset on <form> work around this sufficiently?

Any ideas if quirks mode CSS and document.write() would make the whole  
exercise futile?

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/





More information about the Implementors mailing list