[whatwg] Tag Soup: Blocks-in-inlines
Ian Hickson
ian at hixie.ch
Thu Jan 26 15:17:06 PST 2006
On Wed, 25 Jan 2006, Henri Sivonen wrote:
>
> Anyway, here's what I thought they were doing:
>
> There's low-level parser [that] is kind of like a tag-level lexer and
> emits a (non-well-formed) sequence of SAX-like events like startTag,
> characters, endTag and comment (in my parser* HtmlParser.java).
That's the Tokenisation Stage in the spec now.
> These events don't go to the DOM builder / content sink directly.
> Instead, there's a filter layer that takes care of tag inference and
> emits a well-formed event stream (TagInferenceFilter.java and
> EmptyElementFilter.java in my parser). Additionally, there's a filter
> (not present in my parser, which is designed for conformance checking;
> this may need to be integrated into the tag inference filter) that
> performs the "residual style" fixups.
That wouldn't work. You can't know whether something is well-formed or not
til you get to the end of it. Consider these examples in light of what
Mozilla and Safari do with them:
<em>
<strong>
...2GB...
</em>
</strong>
Or:
<em>
...2GB...
<p>
...2GB...
</em>
</p>
Incremental rendering means you have to be adding stuff to the DOM as you
get it, you can't wait to be sure.
(Mozilla does a "pre-parse" with what it has, sort of like what you are
suggesting, but it only does it with what it has, which means that the DOM
you get is dependent on packet boundaries and such. This results in
non-deterministic parsing, which isn't really acceptable.)
> Perhaps this model is a simple enough model to be deterministically
> specified but still good enough an approximation of Gecko's and
> WebCore's behavior. All decisions are local to the parse event being
> observed and do not involve reshuffling the parts of the DOM that have
> already been built.
If it doesn't handle the examples in this thread like IE (in the
rendering) then it isn't good enough.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list