[whatwg] Spec comments, sections 1-2
Aryeh Gregor
Simetrical+w3c at gmail.com
Wed Jul 29 09:34:55 PDT 2009
On Wed, Jul 29, 2009 at 4:39 AM, Ian Hickson<ian at hixie.ch> wrote:
> There is value in not changing them unless they are actually broken --
> when I edit the spec, there's always a risk I'll break something.
Okay, not a big deal then.
> I've required UAs to catch this case and added this example.
Okay, great.
> Which others are needed for compatibility?
I don't know, but there are certainly some. Otherwise, why would
browsers support so many? For instance, baidu.com is #9 on Alexa and
serves gb2312 as far as I can tell. So does qq.com, which is #14.
And sina.com.cn, #19. vkontakte.ru is #30 and serves Windows-1251.
tudou.com (#60) uses gbk. rakuten.co.jp (#68) serves EUC-JP.
This is just from a quick manual look at a few of the largest
non-English sites. I'd think it would be fairly easy for someone
(e.g., Google) to come up with a rough summary of character encoding
usage on the web by percentage, and for vendors to say which encodings
they support, so a useful common list could be worked out.
If browsers differ in which encodings they accept, that harms
interoperability, so I'd think it would be ideal if HTML 5 would
specify the exact list of encodings that must be supported and
prohibited support for any others. The union of encodings supported
by existing browsers would be a reasonable start, since supporting a
new encoding is presumably pretty cheap. Unless this is viewed as
outside the scope of HTML 5 -- e.g., if browsers tend to rely on the
operating system for encoding support.
More information about the whatwg
mailing list