[Imps] Liberal XML parsing

Anne van Kesteren annevk at opera.com
Mon Jan 8 08:48:12 PST 2007

On Mon, 08 Jan 2007 17:42:49 +0100, Sam Ruby <rubys at intertwingly.net>  
> The current tokenizer has ".lower()" sprinkled throughout and doesn't  
> expose in any meaningful way the difference between empty and start tags.

Because there is no difference between them. See the HTML5 specification.

> For the tokenizer to be meaningfully subclassed (and by that, I mean  
> without requiring wholesale duplication of a number of methods), these  
> behaviors would need to be factored out into separate methods that could  
> be overridden.

You could subclass it and change processSolidusInTag. Instead of throwing  
an atheist parse error you would change the type of token to be "empty" or  

Not sure how to do the .lower() stuff. I kind of guessed the reason you  
wanted to change that was because of a project like this :-)

Anne van Kesteren

More information about the Implementors mailing list