[whatwg] Test cases for parsing spec (Was: Re: Provding Better Tools)
rubys at intertwingly.net
Wed Dec 6 06:13:26 PST 2006
James Graham wrote:
> Ian Hickson wrote:
>> On Tue, 5 Dec 2006, James Graham wrote:
>>> As someone in the process of implementing a HTML5 parser from the
>>> spec, my _only_ complaint so far is that there aren't (yet) any
>> If you could get together with the other people writing parsers and
>> come up with a standard format for test cases, that would be really
>> helpful. I have a few tests I could contribute, but I'd need a format
>> to provide them in (they're currently not in a form that would be
>> useful to you).
> Did you have a list for implementers somewhere? I think it would be a
> very worthwhile effort to come up with a set of implementation
> independent, self-describing (i.e. where the testcase itself contains
> the expected parse tree in some form), testcases - but I think the
> discussion should be on a separate list.
Count me in. This is actually closer to the original reason why I
originally subscribed to this list. If given a few tests, I could
convert them into a useful form,and this form could serve as a model for
My original interest was to write a replacement for Python's SGMLLIB,
i.e., one that was not based on the theoretical ideal of how SGML
vocabularies work, but one based on the practical notion of how HTML
actually is parsed.
My background: I originally wrote most of the back end for the feed
validator, and continue to be its primary maintainer. I also contribute
to the universal feed parser.
The format of the test cases for both validator and parser are very
similar: a standalone document with a structured comment. In the
structured comment is an assertion. In the validator's case, it
describes a message that is, or is not, expected to occur. In the
parser's case, it describes what amounts to an XPath expression. I do
believe that a similar approach could work here, not for 100% of the
test cases, but close enough to handle the bulk of the cases. The rest
can be handled separately.
Additional things like mime type overrides could also be specified in
My goal would be to produce something that I could use within the
feedparser (and therefore, planet).
- Sam Ruby
More information about the whatwg