[whatwg] Default encoding to UTF-8?
bzbarsky at MIT.EDU
Mon Dec 5 13:49:45 PST 2011
On 12/5/11 12:42 PM, Leif Halvard Silli wrote:
> Last I checked, some of those locales defaulted to UTF-8. (And HTML5
> defines it the same.) So how is that possible?
Because authors authoring pages that users of those locales tend to use
use UTF-8 more than anything else?
> Don't users of those locales travel as much as you do?
People on average travel less than David does, yes. In all locales.
But that's not the point. I think you completely misunderstood his
comments about travel and locales. Keep reading.
> What kind of trouble are you actually describing here? You are
> describing a problem with using UTF-8 for *your locale*.
No. He's describing a problem using UTF-8 to view pages that are not
written in English.
Now what language are the non-English pages you look at written in?
Well, it depends. In western Europe they tend to be in languages that
can be encoded in ISO-8859-1, so authors sometimes use that encoding
(without even realizing it). If you set your browser to default to
UTF-8, those pages will be broken.
In Japan, a number of pages are authored in Shift_JIS. Those will
similarly be broken in a browser defaulting to UTF-8.
> What is your locale?
Why does it matter? David's default locale is almost certainly en-US,
which defaults to ISO-8859-1 (or whatever Windows-??? encoding that
actually means on the web) in his browser. But again, he's changed the
default encoding from the locale default, so the locale is irrelevant.
> (Quite often it sounds as
> if some see Latin-1 - or Windows-1251 as we now should say - as a
> 'super default' rather than a locale default. If that is the case, that
> it is a super default, then we should also spec it like that! Until
> further, I'll treat Latin-1 as it is specced: As a default for certain
That's exactly what it is.
> Since it is a locale problem, we need to understand which locale you
> have - and/or which locale you - and other debaters - think they have.
Again, doesn't matter if you change your settings from the default.
> However, you also say that your problem is not so much related to pages
> written for *your* locale as it is related for pages written for users
> of *other* locales. So how many times per year do Dutch, Spanish or
> Norwegian - and other non-English pages - are creating troubles for
> you, as a English locale user? I am making an assumption: Almost never.
> You don't read those languages, do you?
Did you miss the "travel" part? Want to look up web pages for museums,
airports, etc in a non-English speaking country? There's a good chance
they're not in English!
> This is also an expectation thing: If you visit a Russian page in a
> legacy Cyrillic encoding, and gets mojibake because your browser
> defaults to Latin-1, then what does it matter to you whether your
> browser defaults to Latin-1 or UTF-8? Answer: Nothing.
> I think we should 'attack' the dominating locale first: The English
> locale, in its different incarnations (Australian, American, UK). Thus,
> we should turn things on the head: English users should start to expect
> UTF-8 to be used. Because, as English users, you are more used to
> 'mojibake' than the rest of us are: Whenever you see it, you 'know'
> that it is because it is a foreign language you are reading.
Modulo smart quotes (and recently unicode ellipsis characters). These
are actually pretty common in English text on the web nowadays, and have
a tendency to be in "ISO-8859-1".
> Or, please, explain to us when and where it
> is important that English language users living in their own, native
> lands so to speak, need that their browser default to Latin-1 so that
> they can correctly read English language pages?
> See? We would have a plan. Or what do you think?
Try it in your browser. When I set UTF-8 as my default, there were
broke quotation marks all over the web for me. And I'm talking pages in
More information about the whatwg