[whatwg] Default encoding to UTF-8?
Leif Halvard Silli
xn--mlform-iua at xn--mlform-iua.no
Mon Dec 5 10:55:43 PST 2011
>> (And HTML5 defines it the same.)
> No. As far as I understand, HTML5 defines US-ASCII to be the default and
> requires that any other encoding is explicitly declared. I do like this
We are here discussing the default *user agent behaviour* - we are not
specifically discussing how web pages should be authored.
For use agents, then please be aware that HTML5 maintains a table over
'Suggested default encoding':
When you say 'requires': Of course, HTML5 recommends that you declare
the encoding (via HTTP/higher protocol, via the BOM 'sideshow' or via
<meta charset=UTF-8>). I just now also discovered that Validator.nu
issues an error message if it does not find any of of those *and* the
document contains non-ASCII. (I don't know, however, whether this error
message is just something Henri added at his own discretion - it would
be nice to have it literally in the spec too.)
(The problem is of course that many English pages expect the whole
"Unicode alphabet" even if they only contain US-ASCII from the start.)
HTML5 says that validators *may* issue a warning if UTF-8 is *not* the
encoding. But so far, validator.nu has not picked that up.
> We should also lobby for authoring tools (as recommended by HTML5) to
> default their output to UTF-8 and make sure the encoding is declared.
HTML5 already says: "Authoring tools should default to using UTF-8 for
newly-created documents. [RFC3629]"
> so many pages, supposedly (I have not researched this), use the incorrect
> encoding, it makes no sense to try to clean this mess by messing with
> existing defaults. It may fix some pages and break others. Browsers have
> the ability to override an incorrect encoding and this a reasonable
Do you use a English locale computer? If you do, without being a native
English speaker, then you are some kind of geek ... Why can't you work
around the troubles -as you are used to anyway?
Starting a switch to UTF-8 as the default UA encoding for English
locale users should *only* affect how English locale users experience
languages which *both* need non-ASCII *and* historically have been
using Windows-1252 as the default encoding *and* which additionally do
not include any encoding declaration.
Leif Halvard Silli
More information about the whatwg