[whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization

Henri Sivonen hsivonen at hsivonen.fi
Wed Feb 26 03:07:25 PST 2014

On Sat, Feb 8, 2014 at 12:37 AM, Ian Hickson <ian at hixie.ch> wrote:
> What have you learnt so far?

I've learned that I've misattributed the cause of high frequency of
character encoding menu usage in the case of the Traditional Chinese
localization. We've been shipping after the wrong fallback encoding
(UTF-8) even after the fallback encoding was supposedly fixed (to
Big5). Shows what kind of a mess our previous mechanism for setting
the fallback encoding in a locale-dependent way was. The fallback
encoding for Traditional Chinese will change to Big5 for real in
Firefox 28.

I might have improved (hopefully; to be seen still) Firefox for the
wrong reason. Oops. :-)

Also, more baseline telemetry data (i.e. data without TLD-based
guessing) is now available. The last 3 weeks of Firefox 25 on the
release channel:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381393 . The
last 3 weeks of Firefox 26 on the release channel:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381394 . The
rows for locales with such little usage overall but even a couple of
sessions with the encoding menu use puts them of the list
percentage-wise are grayed. In both cases, the top entries in black
are Traditional Chinese and Thai, both of which have the wrong
fallback. Up next are CJK followed by the Cyrillic locales that have a
detector on by default (Russian and Ukrainian), which makes one wonder
if the detectors are doing more harm than good. Up next is Arabic,
which has the wrong fallback. (These wrong fallbacks are fixed in
Firefox 28. In Firefox 28, no locale falls back to UTF-8.)

Henri Sivonen
hsivonen at hsivonen.fi

More information about the whatwg mailing list