[whatwg] Comment Syntax and Parsing

Lachlan Hunt lachlan.hunt at lachy.id.au
Sun Jan 22 20:24:37 PST 2006


Ian Hickson wrote:
> On Sun, 22 Jan 2006, Lachlan Hunt wrote:
>> Firstly, Ian, could you please clarify what exactly made you change your 
>> mind about this issue...?
> 
> What made me change my mind was a realisation that I was being stupid. The 
> person who originally implemented this feature in Mozilla, the current 
> maintainer of the HTML Parser in Mozilla, the person who implemented it in 
> Safari, the person who implemented it in Opera, the people who tested it 
> in Opera, the CTO of Opera Software, the president of Prince's YesLogic, 
> and one of YesLogic's board members, all came to me and told me that I was 
> being stupid, and it was only when a good portion of them all said it 
> within about two days of each other that I realised that they were right.

Well, for what it's worth, I still don't think you were being stupid, I 
think you were right all along and had this been implemented by more 
than just Mozilla 7 years ago, the result may have been different. 
However, since interoperability and compatibility with the mistakes of 
the past is more important, and the fact that all of those vendors have 
unanimously voted against implementing proper comment handling in favour 
of quirks-mode-style parsing, there really isn't a choice in the matter.

>> Secondly, what will now be defined as a conforming comment syntax for 
>> use in a document?
> 
> Probably the same as XML. Or maybe just "<!--" followed by zero or more 
> characters other than U+0000, followed by "-->".

I vote for keeping it very similar to XML, it'll be easier for authors 
only having to learn and remember one comment syntax.

>> Ignoring parsing requirements, is it safe to assume that HTML will 
>> borrow from the stricter XML comment syntax, which start with '<!--' and 
>> end with '-->' and does not contain '--' anywhere in between?
>>
>> In other words:
>> <!-- valid comment -->
>> <!-- invalid -- comment -->
>> <!-- invalid -- -- comment --> (though, valid in HTML4)
>>
>> That seems like the most backwards compatible method, it remains 
>> compatible with the HTML4 syntax and is actually the way most good 
>> tutorials teach authors to write comments.
> 
> Yeah. The question is do we really want to confuse people by telling them 
> that their comment is invalid when they write:
> 
>    <!----------------------------->

Yes, for backwards compatibility reasons.  Current versions of Gecko 
will  (depending on the number of "--") either output that as content 
due to the way it re-parses it or, if another similar comment follows, 
comment out an entire section.

e.g. Consider this case:

<!-- Start Section 1 -- [description of section 1] -->
<h1>Foo</h1>
<p>...</p>

<!-- Start Section 2 -- [description of section 2] -->
<h1>Bar</h1>
<p>...</p>

In that case, in current versions of Mozilla (standards mode), section 1 
will be entirely commented out.  In newer versions that implement these 
new comment parsing requirements, it may show as intended by the author.

Another question is, do we wish to continue allowing white space like this:
<!-- comment --   >

I believe it's supported by all browsers without any difficulty (so no 
backwards compatibility concerns) and regardless of whether it's a valid 
syntax or not, parsers will need to handle it anyway, so we may as well 
allow it.

I'd define comments to match this syntax (borrowed from XML but added S? 
to the end):

Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '--' S? '>'

-- 
Lachlan Hunt
http://lachy.id.au/




More information about the whatwg mailing list