[Imps] Liberal XML parsing

Mon Jan 8 09:41:41 PST 2007

Sam Ruby wrote:
> I've provided one way: by refactoring it so that all the lowercasing of 
> element names is done in exactly one place, and that the lowercasing of 
> attribute names is also done in exactly one place.  That class can be 
> subclassed to provide a different behavior.

That sounds fine to me. We need to add some unicode tests though to be sure 
we're not lowercasing where we shouldn't be.

> I'm in no particular rush, but if after a few days it turns out that 
> people are OK with something *like* this going into the html5lib 
> repository, I'd love to put it in there -- at which point it would be 
> free to evolve, be renamed, refactored, and enhanced.  One thing I would 
> love to work on is a true DOM builder (at which point, I could throw 
> away my XMLDocument, XMLElement, and XMLComment classes), but I would 
> need changes to TreeBuilder so that I could provide my own Text class 
> (for example).

FWIW I consider supporting one of the python DOM implementations a priority for 
the 0.3 release of html5lib (of course we need to release 0.2 first -- at this 
point that is basically a case of uploading the source archive). Using the 
current treebuilder interface it should be possible to support DOM-like text 
nodes without any changes but it's non-trivial so maybe the current interface is 
in need of improvement (the problem is that we aslo need to support ElementTree 
which regards text as attributes).

-- 
"Eternity's a terrible thought. I mean, where's it all going to end?"
  -- Tom Stoppard, Rosencrantz and Guildenstern are Dead