[whatwg] Default encoding to UTF-8?

Jukka K. Korpela jkorpela at cs.tut.fi
Wed Nov 30 15:47:03 PST 2011

2011-12-01 1:28, Faruk Ates wrote:

> My understanding is that all browsers* default to Western Latin (ISO-8859-1)
 > encoding by default (for Western-world downloads/OSes) due to legacy 
content on the web.

Browsers default to various encodings, often windows-1252 (rather than 
ISO-8859-1). They may also investigate the actual data and make a guess 
based on it.

> I'm wondering if it might not be good to start encouraging defaulting to UTF-8,

It would not. There’s no reason to recommend any particular defaulting, 
especially not something that deviates from past practices.

It might be argued that browsers should do better error detection and 
reporting, so that they inform the user e.g. if the document’s encoding 
has not been declared at all and it cannot be inferred fairly reliably 
(e.g., from BOM). But I’m afraid the general feeling is that browsers 
should avoid warning users, as that tends to contradict authors’ 
purposes – and, in fact, mostly things that are serious problems in 
principle aren’t that serious in practice.

> We like to think that “every web developer is surely building things in UTF-8 nowadays”
 > but this is far from true.

There’s a large amount of pages declared as UTF-8 but containing Ascii 
only, as well as pages mislabeled as UTF-8 but containing e.g. ISO-8859-1.

> I still frequently break websites and webapps simply by entering my name (Faruk Ateş).

That’s because the server-side software (and possibly client-side 
software) cannot handle the letter “ş”. It would not help if the page 
were interpreted as UTF-8. If the author knows that a server-side form


