[whatwg] Internal character encoding declaration
Henri Sivonen
hsivonen at iki.fi
Sat Mar 11 14:49:03 PST 2006
On Mar 11, 2006, at 17:10, Henri Sivonen wrote:
> Initialize a character decoder that the bytes 0x20–0x7E (inclusive)
> as well as 0x09, 0x0A and 0x0D decode to the Unicode code points of
> the same (zero-extended) value and maps all other bytes to U+FFFD
> and raises a REWIND flag
On further reflection, it occurred to me that emitting the
Windows-1252 characters instead of U+FFFD would be a good
optimization for the common case where the encoding later turns out
to be Windows-1252 or ISO-8859-1. This would require more that one
bookkeeping flag, though.
> If a start tag other than html or head is seen, emit an easy parse
> error.
Same with character data.
> Encoding errors are easy parse errors. (Emit U+FFFD on bogus data.)
Except for the ISO-8859-* family the easy error recovery should be
emitting the characters according to the corresponding Windows-*
family superset.
--
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
More information about the whatwg
mailing list