[whatwg] comment parsing

Ian Hickson ian at hixie.ch
Sun Jan 22 18:15:19 PST 2006


On Sat, 21 Jan 2006, Anne van Kesteren wrote:
>
> Given the new parsing rules for comments (all those internal discussions...) I
> was trying to write some testcases for how they are defined now.
> 
> # <p><!-- -- -->PASS<!--></p>
> 
> However, from the specification it is not entirely clear what should 
> happen with <!--></p>. Well, perhaps it is, but then I'd like that to be 
> changed. If we take the problematic snippet:
> 
> # <!--></p>
> 
> It seems that per 
> <http://whatwg.org/specs/web-apps/current-work/#marked> "<!--" starts 
> the comment. It seems that per 
> <http://whatwg.org/specs/web-apps/current-work/#comment> all characters 
> that follow and are not a dash have to become part of the comment. Is 
> that correct?

Yes. The </p> is part of the comment.


> So if I would modify the testcase to say:
> 
> # <p><!-- -- -->PASS<!--></p>FAIL
> 
> And directly after "FAIL" it is EOF (or a few end tags later) it would never
> show up, right?

Correct.


> Given that most browsers show "FAIL" or "<!-->FAIL" for:
> 
> # <p><!-->FAIL</p>
> 
> A change might be in order. Or perhaps someone explaining to me what I 
> did wrong when reading the specification.

Your reading is correct.

The reason the spec doesn't say that you re-parse if you hit EOF with an 
open comment is that it is a security risk.

Imagine that the page contains the following:

   ...
   <!--
     <script> hostileScript(): </script>
   -->
   ...

...where "hostileScript()" is some script that does something bad.

A DOS attack on the server could cause the transmitted text to be:

   ...
   <!--
     <script> hostileScript(): </script>

...which, if we re-parse the content upon hitting EOF with an open 
comment, would cause the script to be executed.

This scenario could show itself any time that a blog entry system allows 
users to enter comments, for instance.

(Thanks to Jesse Ruderman for pointing this out.)

(I could be convinced that <!--> should be a full comment -- allowing the 
<!-- and --> parts to overlap -- if it could be shown that UAs implement 
this behaviour separately from their implementing <!--EOF as reparsing.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



More information about the whatwg mailing list