[Imps] Liberal XML parsing

Sam Ruby rubys at intertwingly.net
Mon Jan 8 08:42:49 PST 2007


Anne van Kesteren wrote:
> 
>> What I WOULD be interested in hearing opinions on is what would be the
>> best way to maintain this code going forward: could it live as a
>> separate module within html5lib repository?  Should it be a separate
>> repository?  If separate, are there some changes to the tokenizer in
>> particular that could be made that would either directly enable this
>> usage or would make it easier to monkey-patch for usage by xhtml5lib?
> 
> Can't you subclass the tokenizer? (I don't mind it being in the same 
> repository as html5lib by the way. Not sure what the best location is.)

The current tokenizer has ".lower()" sprinkled throughout and doesn't 
expose in any meaningful way the difference between empty and start tags.

For the tokenizer to be meaningfully subclassed (and by that, I mean 
without requiring wholesale duplication of a number of methods), these 
behaviors would need to be factored out into separate methods that could 
be overridden.

- Sam Ruby



More information about the Implementors mailing list