[whatwg] Tag Soup: Blocks-in-inlines

Ian Hickson ian at hixie.ch
Thu Jan 26 15:17:06 PST 2006

On Wed, 25 Jan 2006, Henri Sivonen wrote:
> Anyway, here's what I thought they were doing:
> There's low-level parser [that] is kind of like a tag-level lexer and 
> emits a (non-well-formed) sequence of SAX-like events like startTag, 
> characters, endTag and comment (in my parser* HtmlParser.java).

That's the Tokenisation Stage in the spec now.

> These events don't go to the DOM builder / content sink directly. 
> Instead, there's a filter layer that takes care of tag inference and 
> emits a well-formed event stream (TagInferenceFilter.java and 
> EmptyElementFilter.java in my parser). Additionally, there's a filter 
> (not present in my parser, which is designed for conformance checking; 
> this may need to be integrated into the tag inference filter) that 
> performs the "residual style" fixups.

That wouldn't work. You can't know whether something is well-formed or not 
til you get to the end of it. Consider these examples in light of what 
Mozilla and Safari do with them:




Incremental rendering means you have to be adding stuff to the DOM as you 
get it, you can't wait to be sure.

(Mozilla does a "pre-parse" with what it has, sort of like what you are 
suggesting, but it only does it with what it has, which means that the DOM 
you get is dependent on packet boundaries and such. This results in 
non-deterministic parsing, which isn't really acceptable.)

> Perhaps this model is a simple enough model to be deterministically 
> specified but still good enough an approximation of Gecko's and 
> WebCore's behavior. All decisions are local to the parse event being 
> observed and do not involve reshuffling the parts of the DOM that have 
> already been built.

If it doesn't handle the examples in this thread like IE (in the 
rendering) then it isn't good enough.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list