[whatwg] Comment Syntax and Parsing

Lachlan Hunt lachlan.hunt at lachy.id.au
Sun Jan 22 22:50:57 PST 2006


Ian Hickson wrote:
> On Mon, 23 Jan 2006, Lachlan Hunt wrote:
>> Well, for what it's worth, I still don't think you were being stupid, I think
>> you were right all along and had this been implemented by more than just
>> Mozilla 7 years ago, the result may have been different.
> 
> Authors find the -- thing unbelievably confusing.

Oh, yes, absolutely.  I know, I've tried explaining it to some with 
varying degrees of success.

> Why does:
> 
>   <!-- Hello
>     -- World
>     -- How does <comment> work?
>     -- I don't know.
>     -- Do you?
>     -->
> 
> ...work,

Well that depends on the implementation and how SGML defines that such 
erroneous comments be handled.  (Without a copy of IS0O-8879 handy, it's 
difficult to check, so the following is based purely on observing the 
implementations.)

Mozilla will handle that entirely as a single comment, which is closed 
at the occurance of --> at the end.

onsgmls, however, (which is more likely to be closer to the SGML spec) 
will encounter the 'W' in 'World', which is outside of the comment, 
treat it as an erroneous unclosed comment declaration and implicity 
close it.  It will then drop the 'W' completely and continue on, 
treating <comment> as an unknown and unclosed element along the way 
(assuming an HTML doctype is used).

So, basically, none of those examples actually "work", they just appear 
to work in some implementations.

> (What HTML5 says isn't really quirks mode comment parsing, it's even 
> simpler.)

Ok, well then I don't have a clue how quirks mode parsing works, it's 
just too unpredictable.  I'm glad this is going to be simpler.  Do you 
know if browsers will be using this for both standards and quirks mode 
or will they retain their existing quirks mode parsing and use this as 
the new standards mode parsing only?

>>> Probably the same as XML. Or maybe just "<!--" followed by zero or 
>>> more characters other than U+0000, followed by "-->".
>> I vote for keeping it very similar to XML, it'll be easier for authors 
>> only having to learn and remember one comment syntax.
> 
> Plus CSS's. Plus Javascript's. So three syntaxes, at least.

Yes, but authors don't confuse CSS and JavaScript as being the same 
language as HTML as often as they confuse HTML and XHTML as being the same.

> ...and this is assuming they'll ever use XML.

Well, many authors believe their using XHTML, and many even believe they 
using the correct XHTML MIME Type (using <meta>), even though they're 
not.  So, regardless of whether they actually are or not, they're going 
to believe they are and it's best not to confuse them more by saying:
    "<!--------> isn't well-formed XML"

and have them come back and say:
    "the validator says it's fine"

and then tell them:
   "that's because the document isn't XHTML".

only to hear:
   "Yes it is, look at the meta element and all these slashes (<br/>)"

>> Another question is, do we wish to continue allowing white space like this:
>> <!-- comment --   >
>>
>> I believe it's supported by all browsers without any difficulty
> 
> Actually, it isn't. In most browsers that I tested the above gets treated 
> as an unclosed comment which is then re-parsed in "close at first >" mode.

You're right, but IE was the only browser that I could find which (in 
standards mode) treated it like that.

> Since we're dropping the re-parse mode (see earlier mails), this goes away 
> with it.

OK.

-- 
Lachlan Hunt
http://lachy.id.au/




More information about the whatwg mailing list