[whatwg] Default encoding to UTF-8?

Henri Sivonen hsivonen at iki.fi
Tue Jan 3 00:50:26 PST 2012


On Tue, Jan 3, 2012 at 10:33 AM, Henri Sivonen <hsivonen at iki.fi> wrote:
> A solution that would border on reasonable would be decoding as
> US-ASCII up to the first non-ASCII byte and then deciding between
> UTF-8 and the locale-specific legacy encoding by examining the first
> non-ASCII byte and up to 3 bytes after it to see if they form a valid
> UTF-8 byte sequence. But trying to gain more statistical confidence
> about UTF-8ness than that would be bad for performance (either due to
> stalling stream processing or due to reloading).

And it's worth noting that the above paragraph states a "solution" to
the problem that is: "How to make it possible to use UTF-8 without
declaring it?"

Adding autodetection wouldn't actually force authors to use UTF-8, so
the problem Faruk stated at the start of the thread (authors not using
UTF-8 throughout systems that process user input) wouldn't be solved.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/



More information about the whatwg mailing list