[Imps] Liberal XML parsing
Anne van Kesteren
annevk at opera.com
Mon Jan 8 14:44:40 PST 2007
On Mon, 08 Jan 2007 19:46:27 +0100, Sam Ruby <rubys at intertwingly.net>
wrote:
>> Well, the next "bit" would probably be processing instructions. That's
>> why it would be nice to have some formalization / standardization first
>> to see how many changes are required exactly.
>
> I have no interest in XML processing instructions at this time.
Fair enough. But if this is becoming the foundation of an (experimental)
liberal XML parser we'll have interest in due course I reckon. If only for
<?xbl?> and <?xml-stylesheet?>.
>> Currently html5lib maps rather well to the specificaction which
>> improves the readability of the code a lot (imho). I'd like to know at
>> how many changes we're looking and how that impacts the code.
>
> That's why I provided a comprehensive patch:
>
> http://intertwingly.net/stories/2007/01/08/xhtml5.diff
Instead of using string.ascii_uppercase you should use our internal
asciiUppercase. Also, instead of using a dict for translating can't you
just provide two strings? I'd think that would be faster.
The normalizeToken method should be inlined as you only want to do that
from a single place anyway. And EndTag should use the translate method and
not .lower().
I suppose these changes also remove the need for asciiLowercase (not
asciiLower that you introduce) as defined in constants.py.
Anyway, with these nits (open for debate) I think I'm ok with doing this
assuming you will update the tests as well (or someone else will). I'd
like to have a liberal XML parser too one day and working on an
experimental implementation of one can't hurt I suppose :-)
If xhtml5parser.py is the only other file I would be fine with adding that
to src/ as liberalxmlparser.py. Bit of a lengthty name, but it more
accurately reflects what it is.
--
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>
More information about the Implementors
mailing list