[Imps] Liberal XML parsing

Mon Jan 8 10:28:22 PST 2007

On Mon, 08 Jan 2007 18:23:40 +0100, Sam Ruby <rubys at intertwingly.net>  
wrote:
>>  Because there is no difference between them. See the HTML5  
>> specification.
>
> My point is that by "baking in" that behavior into the tokenizer, it  
> essentially limits that tokenizer to just supporting HTML5.  By  
> providing one extra "bit" of information, the potential for reuse is  
> increased.

Well, the next "bit" would probably be processing instructions. That's why  
it would be nice to have some formalization / standardization first to see  
how many changes are required exactly.

Currently html5lib maps rather well to the specificaction which improves  
the readability of the code a lot (imho). I'd like to know at how many  
changes we're looking and how that impacts the code.

> From a maintenance point of view, that is suboptimal.  As  
> processSolidusInTag changes, that maintenance would need to occur in two  
> places.

Well, the method isn't that big :-)

>> Not sure how to do the .lower() stuff. I kind of guessed the reason you  
>> wanted to change that was because of a project like this :-)
>
> I've provided one way: by refactoring it so that all the lowercasing of  
> element names is done in exactly one place, and that the lowercasing of  
> attribute names is also done in exactly one place.  That class can be  
> subclassed to provide a different behavior.

Do you this as a standalone patch somewhere? As mentioned before, I'd like  
to see how it deals with non-ASCII characters.

> Once this stabilized, I would them plan to look at having the UFP take  
> advantage of this library, if it is installed/available.  I'd also  
> modify Venus, but such support would not need to be conditional there:  
> Venus could simply include html5lib.

That'd be cool! I read today that actual usage and support is important if  
you want your library to be included in the default distribution.

-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>