[Imps] Liberal XML parsing
Anne van Kesteren
annevk at opera.com
Mon Jan 8 10:28:22 PST 2007
On Mon, 08 Jan 2007 18:23:40 +0100, Sam Ruby <rubys at intertwingly.net>
>> Because there is no difference between them. See the HTML5
> My point is that by "baking in" that behavior into the tokenizer, it
> essentially limits that tokenizer to just supporting HTML5. By
> providing one extra "bit" of information, the potential for reuse is
Well, the next "bit" would probably be processing instructions. That's why
it would be nice to have some formalization / standardization first to see
how many changes are required exactly.
Currently html5lib maps rather well to the specificaction which improves
the readability of the code a lot (imho). I'd like to know at how many
changes we're looking and how that impacts the code.
> From a maintenance point of view, that is suboptimal. As
> processSolidusInTag changes, that maintenance would need to occur in two
Well, the method isn't that big :-)
>> Not sure how to do the .lower() stuff. I kind of guessed the reason you
>> wanted to change that was because of a project like this :-)
> I've provided one way: by refactoring it so that all the lowercasing of
> element names is done in exactly one place, and that the lowercasing of
> attribute names is also done in exactly one place. That class can be
> subclassed to provide a different behavior.
Do you this as a standalone patch somewhere? As mentioned before, I'd like
to see how it deals with non-ASCII characters.
> Once this stabilized, I would them plan to look at having the UFP take
> advantage of this library, if it is installed/available. I'd also
> modify Venus, but such support would not need to be conditional there:
> Venus could simply include html5lib.
That'd be cool! I read today that actual usage and support is important if
you want your library to be included in the default distribution.
Anne van Kesteren
More information about the Implementors