[Imps] Liberal XML parsing

Anne van Kesteren annevk at opera.com
Mon Jan 8 10:28:22 PST 2007

On Mon, 08 Jan 2007 18:23:40 +0100, Sam Ruby <rubys at intertwingly.net>  
>>  Because there is no difference between them. See the HTML5  
>> specification.
> My point is that by "baking in" that behavior into the tokenizer, it  
> essentially limits that tokenizer to just supporting HTML5.  By  
> providing one extra "bit" of information, the potential for reuse is  
> increased.

Well, the next "bit" would probably be processing instructions. That's why  
it would be nice to have some formalization / standardization first to see  
how many changes are required exactly.

Currently html5lib maps rather well to the specificaction which improves  
the readability of the code a lot (imho). I'd like to know at how many  
changes we're looking and how that impacts the code.

> From a maintenance point of view, that is suboptimal.  As  
> processSolidusInTag changes, that maintenance would need to occur in two  
> places.

Well, the method isn't that big :-)

>> Not sure how to do the .lower() stuff. I kind of guessed the reason you  
>> wanted to change that was because of a project like this :-)
> I've provided one way: by refactoring it so that all the lowercasing of  
> element names is done in exactly one place, and that the lowercasing of  
> attribute names is also done in exactly one place.  That class can be  
> subclassed to provide a different behavior.

Do you this as a standalone patch somewhere? As mentioned before, I'd like  
to see how it deals with non-ASCII characters.

> Once this stabilized, I would them plan to look at having the UFP take  
> advantage of this library, if it is installed/available.  I'd also  
> modify Venus, but such support would not need to be conditional there:  
> Venus could simply include html5lib.

That'd be cool! I read today that actual usage and support is important if  
you want your library to be included in the default distribution.

Anne van Kesteren

More information about the Implementors mailing list