[Imps] liberal XML and implied end tags

Sun Mar 11 21:55:39 PDT 2007

Ian Hickson wrote:
> On Sun, 11 Mar 2007, Sam Ruby wrote:
>> Inside src/liberalxmlparser.py, I see:
>>
>>>              if node.name == name:
>>>                  #XXX Something is wrong here... The next (commented) line is
>>>                  #html-only
>>>                  #self.tree.generateImpliedEndTags()
>> Problem #1: if I uncomment out that line, no tests fail.  What's up with 
>> that?  If I need to make a fix that involves restoring that line, how 
>> will I know what that breaks?
> 
> That's why you really need a spec now. :-)

We have a saying where I come from:

   Thanks for volunteering!

Seriously, specs are very important, but so are tests.

My plans are to integrate this code with first the feed parser and then 
with venus and I can already see a number of refactorings that will be 
required.  And that will affect what is ultimately specified.  For 
example, early on I was more naive than I am now and thought that if I 
could build a valid DOM, I could then guarantee that that DOM could be 
serialized as XML.  I put a few checks in the html5lib code that plugged 
a few holes, but now I see that the holes are many: XML has constraints 
on the names of attributes and the characters that may occur in a 
comment, just to name two.  Instead of having html5lib make corrections, 
the right approach is for the spec to list where html5lib's 
responsibilities end, and what is left for the calling application to 
deal with.

Anyway, in the process of writing the code, I plan to produce quite a 
few tests, all of which will be helpful in producing a better spec.  And 
vice versa: the process of writing the spec will refine the tests and 
undoubtedly cause me to refactor some code some more.

- Sam Ruby