[whatwg] Parsing: comment tokenization

Ian Hickson ian at hixie.ch
Tue Jun 19 01:41:59 PDT 2007

On Sat, 7 Apr 2007, Anne van Kesteren wrote:
> The tokenization section should also handle:
>  <!-->
>  <!--->
> as "correct" comments for compat with the web. This means that
>  <!-->-->
> shows "-->" and that
>  <!--->-->
> shows "-->".

These comments are not handled (though not conformant).

On Sat, 7 Apr 2007, Nicholas Shanks wrote:
> Why on earth is this a good idea?

IE7 does it. The assumption is that content therefore depends on it.

> AFAIK browsers and other HTML clients don't currently treat these as 
> comments

This seems to disagree with my research.

> [...] compelling them to do so will cause several problems:
> 1) Web developers currently expect things like <!-->5?--> to result in 
> the comment "greater than five?". Changing such expectations on a whim 
> is harmful.

It is not clear to me that this is indeed true.

> 2) A double HYPHEN-MINUS delimits comments within tags, this provides 
> compatibility with XML and SGML and changing this needlessly in HTML5 
> will just complicate conversion.

This, unfortunately, is impractical. (I say this despite having personally 
pushed for this for years.)

> 3) You claim "compat with the web" but don't provide any evidence to 
> support that. Are there huge numbers of sites expecting <!--> to 
> represent a comment without content? Can such sites not be fixed instead 
> of polluting HTML with additional rules? I'd rather have a handful of 
> broken sites that their authors will fix than saying to the other 99% of 
> authors "hey, you can now do this" and ending up with millions of broken 
> sites. (I say broken, because they will not be backwards compatible with 
> current or previous UAs)

It seems that they will in fact be compatible; but I agree, we shouldn't 
encourage it. The spec makes them non-conforming.

On Sat, 7 Apr 2007, Nicholas Shanks wrote:
> Even you must (begrudgingly?) admit that "comments" formatted as in your 
> original post are not backwards compatible, even if they do reflect the 
> state of modern UAs as you say.

How can both those statements be true?

> I don't believe I am 'pretending' anything. Just stating that diverging 
> further from SGML for No Good Reason is pointless. (And yes, supporting 
> a few odd websites that do this already counts as not a Good Reason, 
> websites can always be fixed!)

Sadly, Web sites can't always be fixed. Many sites have been long 
abandoned and are no longer updated.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list