[whatwg] Possible bug in the character encoding detection algorithm

Ian Hickson ian at hixie.ch
Fri Mar 2 17:28:33 PST 2007

On Fri, 2 Mar 2007, James Graham wrote:
> Given the following line of input:
> <a b='c'>
> 012345678  - byte numbers for reference
> Jump to step labeled "value"
> (Presumably at this point we want to advance to position 5; this is not
> mentioned)


> this seems to lead to an infinite loop (IIRC the same thing happens for
> unquoted values). html5lib currently sidesteps the issue by not moving the
> position back one after finding an attribute.

Yeah, that was an error in the spec. Fixed. Let me know if by implementing 
the algorithm exactly as written now you still get an error.

> This fails to locate the character encoding in e.g.: <meta 
> http-equiv="Content-Type<meta charset="utf-8"> Obviously one possibility 
> is to get all attributes and then, if the current byte is ASCII < move 
> the position back one.

You shouldn't get the character encoding in that case.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list