[whatwg] Default encoding to UTF-8?
Jukka K. Korpela
jkorpela at cs.tut.fi
Tue Dec 6 22:48:17 PST 2011
2011-12-07 2:36, Leif Halvard Silli wrote:
> This entire thread started with a user problem.
As far as I can see, the problem presented was: “I still frequently
break websites and webapps simply by entering my name (Faruk Ateş).”
What we need to fix such issues is that sites and applications are
modified to _deal with_ any characters, and this means that they
minimally need to _parse_ input data as UTF-8 encoded. Of course their
authors need to specify that the form data is to be submitted as UTF-8
encoded, normally by making the page UTF-8 encoded and declaring it as
such. This is surely the most trivial side of the matter.
Pages that currently cannot handle the letter “ş” in input data would
not behave any better if browsers started treating them as UTF-8
encoded, which is what the proposed change would me. On the contrary,
they would work worse. They probably currently work for some set of
characters outside ASCII, such as ISO-8859-1, and the change would stop
that, as letters like “â” would now be transmitted as UTF-8 encoded but
the form handler implies another encoding and sees the data as something
> But with the proposed change, then even users *outside* the locales that
> share the default encoding of the sloppy author's locale, would benefit.
Exactly how would _any_ user benefit from the proposed change? I have
shown that for the form data issue presented, the change would create
serious problems, not solve any—except in the rather theoretical case
where form data processing is based on UTF-8, the page is actually UTF-8
encoded but its encoding is not declared in any way (any examples of
such pages around?) and the user’s browser implies an encoding other
than UTF-8. In this theoretical case, the error correction principle
I’ve suggested (don’t just apply an encoding if it turns out that the
page cannot be in that encoding) would probably fix the problem if the
page contains non-ASCII characters.
More information about the whatwg