[whatwg] Default encoding to UTF-8?

Boris Zbarsky bzbarsky at MIT.EDU
Mon Dec 5 13:49:45 PST 2011


On 12/5/11 12:42 PM, Leif Halvard Silli wrote:
> Last I checked, some of those locales defaulted to UTF-8. (And HTML5
> defines it the same.) So how is that possible?

Because authors authoring pages that users of those locales tend to use 
use UTF-8 more than anything else?

> Don't users of those locales travel as much as you do?

People on average travel less than David does, yes.  In all locales.

But that's not the point.  I think you completely misunderstood his 
comments about travel and locales.  Keep reading.

> What kind of trouble are you actually describing here? You are
> describing a problem with using UTF-8 for *your locale*.

No.  He's describing a problem using UTF-8 to view pages that are not 
written in English.

Now what language are the non-English pages you look at written in? 
Well, it depends.  In western Europe they tend to be in languages that 
can be encoded in ISO-8859-1, so authors sometimes use that encoding 
(without even realizing it).  If you set your browser to default to 
UTF-8, those pages will be broken.

In Japan, a number of pages are authored in Shift_JIS.  Those will 
similarly be broken in a browser defaulting to UTF-8.

> What is your locale?

Why does it matter?  David's default locale is almost certainly en-US, 
which defaults to ISO-8859-1 (or whatever Windows-??? encoding that 
actually means on the web) in his browser.  But again, he's changed the 
default encoding from the locale default, so the locale is irrelevant.

> (Quite often it sounds as
> if some see Latin-1 - or Windows-1251 as we now should say - as a
> 'super default' rather than a locale default. If that is the case, that
> it is a super default, then we should also spec it like that! Until
> further, I'll treat Latin-1 as it is specced: As a default for certain
> locales.)

That's exactly what it is.

> Since it is a locale problem, we need to understand which locale you
> have - and/or which locale you - and other debaters - think they have.

Again, doesn't matter if you change your settings from the default.

> However, you also say that your problem is not so much related to pages
> written for *your* locale as it is related for pages written for users
> of *other* locales. So how many times per year do Dutch, Spanish or
> Norwegian  - and other non-English pages - are creating troubles for
> you, as a English locale user? I am making an assumption: Almost never.
> You don't read those languages, do you?

Did you miss the "travel" part?  Want to look up web pages for museums, 
airports, etc in a non-English speaking country?  There's a good chance 
they're not in English!

> This is also an expectation thing: If you visit a Russian page in a
> legacy Cyrillic encoding, and gets mojibake because your browser
> defaults to Latin-1, then what does it matter to you whether your
> browser defaults to Latin-1 or UTF-8? Answer: Nothing.

Yes.  So?

> I think we should 'attack' the dominating locale first: The English
> locale, in its different incarnations (Australian, American, UK). Thus,
> we should turn things on the head: English users should start to expect
> UTF-8 to be used. Because, as English users, you are more used to
> 'mojibake' than the rest of us are: Whenever you see it, you 'know'
> that it is because it is a foreign language you are reading.

Modulo smart quotes (and recently unicode ellipsis characters).  These 
are actually pretty common in English text on the web nowadays, and have 
a tendency to be in "ISO-8859-1".

> Or, please, explain to us when and where it
> is important that English language users living in their own, native
> lands so to speak, need that their browser default to Latin-1 so that
> they can correctly read English language pages?

See above.

> See? We would have a plan. Or what do you think?

Try it in your browser.  When I set UTF-8 as my default, there were 
broke quotation marks all over the web for me.  And I'm talking pages in 
English.

-Boris



More information about the whatwg mailing list