[whatwg] Default encoding to UTF-8?
Jukka K. Korpela
jkorpela at cs.tut.fi
Wed Nov 30 15:47:03 PST 2011
2011-12-01 1:28, Faruk Ates wrote:
> My understanding is that all browsers* default to Western Latin (ISO-8859-1)
> encoding by default (for Western-world downloads/OSes) due to legacy
content on the web.
Browsers default to various encodings, often windows-1252 (rather than
ISO-8859-1). They may also investigate the actual data and make a guess
based on it.
> I'm wondering if it might not be good to start encouraging defaulting to UTF-8,
It would not. There’s no reason to recommend any particular defaulting,
especially not something that deviates from past practices.
It might be argued that browsers should do better error detection and
reporting, so that they inform the user e.g. if the document’s encoding
has not been declared at all and it cannot be inferred fairly reliably
(e.g., from BOM). But I’m afraid the general feeling is that browsers
should avoid warning users, as that tends to contradict authors’
purposes – and, in fact, mostly things that are serious problems in
principle aren’t that serious in practice.
> We like to think that “every web developer is surely building things in UTF-8 nowadays”
> but this is far from true.
There’s a large amount of pages declared as UTF-8 but containing Ascii
only, as well as pages mislabeled as UTF-8 but containing e.g. ISO-8859-1.
> I still frequently break websites and webapps simply by entering my name (Faruk Ateş).
That’s because the server-side software (and possibly client-side
software) cannot handle the letter “ş”. It would not help if the page
were interpreted as UTF-8. If the author knows that a server-side form
More information about the whatwg