[whatwg] Test cases for parsing spec (Was: Re: Provding Better Tools)

Sam Ruby rubys at intertwingly.net
Wed Dec 6 06:13:26 PST 2006

James Graham wrote:
> Ian Hickson wrote:
>> On Tue, 5 Dec 2006, James Graham wrote:
>>> As someone in the process of implementing a HTML5 parser from the 
>>> spec, my _only_ complaint so far is that there aren't (yet) any 
>>> testcases.
>> If you could get together with the other people writing parsers and 
>> come up with a standard format for test cases, that would be really 
>> helpful. I have a few tests I could contribute, but I'd need a format 
>> to provide them in (they're currently not in a form that would be 
>> useful to you).
> Did you have a list for implementers somewhere? I think it would be a 
> very worthwhile effort to come up with a set of implementation 
> independent, self-describing (i.e. where the testcase itself contains 
> the expected parse tree in some form), testcases - but I think the 
> discussion should be on a separate list.

Count me in.  This is actually closer to the original reason why I 
originally subscribed to this list.  If given a few tests, I could 
convert them into a useful form,and this form could serve as a model for 
future tests.

My original interest was to write a replacement for Python's SGMLLIB, 
i.e., one that was not based on the theoretical ideal of how SGML 
vocabularies work, but one based on the practical notion of how HTML 
actually is parsed.

My background: I originally wrote most of the back end for the feed 
validator, and continue to be its primary maintainer.  I also contribute 
to the universal feed parser.

The format of the test cases for both validator and parser are very 
similar: a standalone document with a structured comment.  In the 
structured comment is an assertion.  In the validator's case, it 
describes a message that is, or is not, expected to occur.  In the 
parser's case, it describes what amounts to an XPath expression.  I do 
believe that a similar approach could work here, not for 100% of the 
test cases, but close enough to handle the bulk of the cases.  The rest 
can be handled separately.

Additional things like mime type overrides could also be specified in 
this header.



My goal would be to produce something that I could use within the 
feedparser (and therefore, planet).

- Sam Ruby

More information about the whatwg mailing list