[Imps] liberal XML and implied end tags
Sam Ruby
rubys at intertwingly.net
Sun Mar 11 21:55:39 PDT 2007
Ian Hickson wrote:
> On Sun, 11 Mar 2007, Sam Ruby wrote:
>> Inside src/liberalxmlparser.py, I see:
>>
>>> if node.name == name:
>>> #XXX Something is wrong here... The next (commented) line is
>>> #html-only
>>> #self.tree.generateImpliedEndTags()
>> Problem #1: if I uncomment out that line, no tests fail. What's up with
>> that? If I need to make a fix that involves restoring that line, how
>> will I know what that breaks?
>
> That's why you really need a spec now. :-)
We have a saying where I come from:
Thanks for volunteering!
Seriously, specs are very important, but so are tests.
My plans are to integrate this code with first the feed parser and then
with venus and I can already see a number of refactorings that will be
required. And that will affect what is ultimately specified. For
example, early on I was more naive than I am now and thought that if I
could build a valid DOM, I could then guarantee that that DOM could be
serialized as XML. I put a few checks in the html5lib code that plugged
a few holes, but now I see that the holes are many: XML has constraints
on the names of attributes and the characters that may occur in a
comment, just to name two. Instead of having html5lib make corrections,
the right approach is for the spec to list where html5lib's
responsibilities end, and what is left for the calling application to
deal with.
Anyway, in the process of writing the code, I plan to produce quite a
few tests, all of which will be helpful in producing a better spec. And
vice versa: the process of writing the spec will refine the tests and
undoubtedly cause me to refactor some code some more.
- Sam Ruby
More information about the Implementors
mailing list