[whatwg] About adopting quirks mode parsing
Simon Pieters
zcorpan at hotmail.com
Mon Jul 17 07:03:20 PDT 2006
Hi,
From: Ian Hickson <ian at hixie.ch>
>On Sun, 18 Jun 2006, Simon Pieters wrote:
> >
> > The spec asks whether quirks mode parsing should be adopted[1]. I think
> > it would be good if parsing worked more or less the same in quirks and
> > standards mode. If we want to adopt quirks mode parsing, then here are
> > some remarks:
> >
> > > Comment parsing is different.
> >
> > I think the current parsing algorithm for comments should remain. I
> > don't think we should adopt IE's "overlapping" comments (<!--> being one
> > comment), because that isn't logical and isn't how they work in XML and
> > comments in other languages (such as /*/ in CSS isn't one comment).
>
>I agree. However, in quirks mode this is a requirement. So if we make the
>parsing quirks-compatible (as in, if we remove DOCTYPE-switching for
>parsing), we have no choice.
Ok. I could live with that.
> > > The following is considered one script block (!):
> > >
> > > <script><!-- document.write('</script>'); --></script>
> >
> > This one is common, I think, and applies to IE6, Safari and Opera even
> > in Standards Mode. Script parsing seems to work like this in Mozilla in
> > Quirks Mode:
> >
> > 1. If the parser hits the string "<!--" then set a flag to ignore
></script>
> > tags.
> > 2. If the parser then hits the string "-->" then reset the flag.
> > 3. The flag can only be set once.
> > 4. If the parser hits EOF, then reset the flag (if it is set) and
>reparse the
> > script.
> >
> > Opera seems to do the same as Mozilla.
>
>Anything that depends on EOF is a bad idea for security reasons, so I
>would be reluctant to do that...
>
> > We would have to drop reparsing though.
>
>...which you seem to agree with. :-)
>
>
> > I've tried to figure out exactly what IE does, but I have failed. It
> > seems to do reparsing sometimes, and others not, and --> after the
> > </script> tag makes a difference, and also whether there are characters
> > after the --> (before EOF). The flag can also be set more than once.
> >
> > Safari seems to do pretty much what IE does.
>
>Can't spec what I can't describe! :-)
If we ignore reparsing, I think I know what Opera, Firefox, IE and Safari
do. See these test cases:
http://simon.html5.org/test/html/parsing/pseudo-comments/
How to interpret results: If there's nothing outside the tested element,
then the parser allows multiple pseudo-comments. If "a-->" is outside the
element in question, then the parser doesn't allow any pseudo-comments; for
"b-->" the parser allows one pseudo-comment.
Below are the results:
opera
standards mode
quirks mode
title
textarea
style
script
noscript
noembed (with plugins enabled)
noframes
one pseudo-comment
firefox
standards mode
title
textarea
multiple pseudo-comments
style
script
noscript
noembed
noframes
no pseudo-comments
quirks mode
title
textarea
multiple pseudo-comments
style
noscript
noembed
noframes
no pseudo-comments
script
one pseudo-comment
ie
standards mode
quirks mode
title
textarea
script
noscript
noembed
noframes
multiple pseudo-comments
style
one pseudo-comment
safari
standards mode
quirks mode
title
textarea
no pseudo-comments
style
script
noscript
noembed
noframes
multiple pseudo-comments
I'm not sure what's most sensible to do. I think this is needed for at least
<script> parsing. My proposal is to allow multiple pseudo-comments for all
RCDATA and CDATA elements.
As for an algorithm for how to do that, I think that an extra flag would be
sufficient. If the parser hits <!-- while in RCDATA or CDATA, the flag is
set to true. Then, if the parser hits --> the flag sets to false. Initially
the flag is false. While the flag is true the element can't be closed.
What's also interesting is that Firefox and IE don't replace entities inside
pseudo-comments for RCDATA elements (title and textarea), but Opera and
Safari do:
http://simon.html5.org/test/html/parsing/pseudo-comments/rcdata/
Results:
firefox
ie
standards mode
quirks mode
title
textarea
entities are not replaced
opera
safari
standards mode
quirks mode
title
textarea
entities are replaced
I guess we could follow IE on this one.
> > > p can contain table
> >
> > I think this might be a good thing. I would also like p to be able to
>contain
> > other struct-inline elements, but perhaps that isn't really possible.
>
>Indeed.
It might be desirable also that a valid HTML4 document gets a conforming
HTML4 DOM. If it is, then <p>s shouldn't contain <table>.
> > > Safari and IE have special parsing rules for <% ... %> (even in
> > > standards mode, though clearly this should be quirks-only).
> >
> > This wouldn't be a bogus comment, as bogus comments end with > (while
> > these end with %>), but I think it would be possible to add this if we
> > want to be more compatible with IE.
>
>Oh we could add anything to be compatible with IE... the questions are do
>we want to be, and do we need to be.
True.
>Like you, I don't know. :-) I want to do some research on this in due
>course, but I haven't been able to do it yet.
Would be interesting to see such a research. :-)
Regards,
Simon Pieters
More information about the whatwg
mailing list