[whatwg] Minor clarification of <meta charset> sniffing

Wed Jun 20 01:10:42 PDT 2007

On Wed, 23 May 2007, Michael Day wrote:
> 
> A minor point relating to comment skipping in the charset sniffing 
> algorithm described in section 8.2.2 of HTML5. The existing text says:
> 
> "Advance the position pointer so that it points at the first 0x3E byte 
> which is preceeded by two 0x2D bytes (i.e. at the end of an ASCII '-->' 
> sequence) and comes after the second 0x2D byte that was found. (The two 
> 0x2D bytes cannot be the same as the those in the '<!--' sequence.) If 
> no such byte is found before the nth byte, abort this "two step" 
> algorithm."
> 
> This clearly says that '<!-->' is not a complete comment, as the second 
> pair of hyphens cannot be the same as the first. However, it doesn't 
> clearly say whether '<!--->' is a complete comment or not.
> 
> One option would be to say that the second two 0x2D bytes come after the 
> second 0x2D byte that was found, not just the 0x3E byte coming after the 
> second 0x2D byte that was found.

I changed it the other way, by allowing overlapped hyphens. This is 
consistent with what we've done with comments in the tokeniser.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'