[whatwg] About adopting quirks mode parsing

Simon Pieters zcorpan at hotmail.com
Mon Jul 17 07:03:20 PDT 2006


Hi,

From: Ian Hickson <ian at hixie.ch>
>On Sun, 18 Jun 2006, Simon Pieters wrote:
> >
> > The spec asks whether quirks mode parsing should be adopted[1]. I think
> > it would be good if parsing worked more or less the same in quirks and
> > standards mode. If we want to adopt quirks mode parsing, then here are
> > some remarks:
> >
> > > Comment parsing is different.
> >
> > I think the current parsing algorithm for comments should remain. I
> > don't think we should adopt IE's "overlapping" comments (<!--> being one
> > comment), because that isn't logical and isn't how they work in XML and
> > comments in other languages (such as /*/ in CSS isn't one comment).
>
>I agree. However, in quirks mode this is a requirement. So if we make the
>parsing quirks-compatible (as in, if we remove DOCTYPE-switching for
>parsing), we have no choice.

Ok. I could live with that.

> > > The following is considered one script block (!):
> > >
> > >      <script><!-- document.write('</script>'); --></script>
> >
> > This one is common, I think, and applies to IE6, Safari and Opera even
> > in Standards Mode. Script parsing seems to work like this in Mozilla in
> > Quirks Mode:
> >
> > 1. If the parser hits the string "<!--" then set a flag to ignore 
></script>
> > tags.
> > 2. If the parser then hits the string "-->" then reset the flag.
> > 3. The flag can only be set once.
> > 4. If the parser hits EOF, then reset the flag (if it is set) and 
>reparse the
> > script.
> >
> > Opera seems to do the same as Mozilla.
>
>Anything that depends on EOF is a bad idea for security reasons, so I
>would be reluctant to do that...
>
> > We would have to drop reparsing though.
>
>...which you seem to agree with. :-)
>
>
> > I've tried to figure out exactly what IE does, but I have failed. It
> > seems to do reparsing sometimes, and others not, and --> after the
> > </script> tag makes a difference, and also whether there are characters
> > after the --> (before EOF). The flag can also be set more than once.
> >
> > Safari seems to do pretty much what IE does.
>
>Can't spec what I can't describe! :-)

If we ignore reparsing, I think I know what Opera, Firefox, IE and Safari 
do. See these test cases:

   http://simon.html5.org/test/html/parsing/pseudo-comments/

How to interpret results: If there's nothing outside the tested element, 
then the parser allows multiple pseudo-comments. If "a-->" is outside the 
element in question, then the parser doesn't allow any pseudo-comments; for 
"b-->" the parser allows one pseudo-comment.

Below are the results:

opera
   standards mode
   quirks mode
      title
      textarea
      style
      script
      noscript
      noembed (with plugins enabled)
      noframes
         one pseudo-comment

firefox
   standards mode
      title
      textarea
         multiple pseudo-comments
      style
      script
      noscript
      noembed
      noframes
         no pseudo-comments
   quirks mode
      title
      textarea
         multiple pseudo-comments
      style
      noscript
      noembed
      noframes
         no pseudo-comments
      script
         one pseudo-comment

ie
   standards mode
   quirks mode
      title
      textarea
      script
      noscript
      noembed
      noframes
         multiple pseudo-comments
      style
         one pseudo-comment

safari
   standards mode
   quirks mode
      title
      textarea
         no pseudo-comments
      style
      script
      noscript
      noembed
      noframes
         multiple pseudo-comments

I'm not sure what's most sensible to do. I think this is needed for at least 
<script> parsing. My proposal is to allow multiple pseudo-comments for all 
RCDATA and CDATA elements.

As for an algorithm for how to do that, I think that an extra flag would be 
sufficient. If the parser hits <!-- while in RCDATA or CDATA, the flag is 
set to true. Then, if the parser hits --> the flag sets to false. Initially 
the flag is false. While the flag is true the element can't be closed.

What's also interesting is that Firefox and IE don't replace entities inside 
pseudo-comments for RCDATA elements (title and textarea), but Opera and 
Safari do:

   http://simon.html5.org/test/html/parsing/pseudo-comments/rcdata/

Results:

firefox
ie
   standards mode
   quirks mode
      title
      textarea
         entities are not replaced

opera
safari
   standards mode
   quirks mode
      title
      textarea
         entities are replaced

I guess we could follow IE on this one.

> > > p can contain table
> >
> > I think this might be a good thing. I would also like p to be able to 
>contain
> > other struct-inline elements, but perhaps that isn't really possible.
>
>Indeed.

It might be desirable also that a valid HTML4 document gets a conforming 
HTML4 DOM. If it is, then <p>s shouldn't contain <table>.

> > > Safari and IE have special parsing rules for <% ... %> (even in
> > > standards mode, though clearly this should be quirks-only).
> >
> > This wouldn't be a bogus comment, as bogus comments end with > (while
> > these end with %>), but I think it would be possible to add this if we
> > want to be more compatible with IE.
>
>Oh we could add anything to be compatible with IE... the questions are do
>we want to be, and do we need to be.

True.

>Like you, I don't know. :-) I want to do some research on this in due
>course, but I haven't been able to do it yet.

Would be interesting to see such a research. :-)

Regards,
Simon Pieters





More information about the whatwg mailing list