[whatwg] Spec comments, sections 1-2
Anne van Kesteren
annevk at opera.com
Wed Aug 5 05:25:20 PDT 2009
On Wed, 05 Aug 2009 02:01:59 +0200, Ian Hickson <ian at hixie.ch> wrote:
> I'm pretty sure that character encoding support in browsers is more of a
> "collect them all" kind of thing than really based on content that
> requires it, to be honest.
Really? I think a lot of them are actually used. If you know anything I'd
love to trim the amount of encodings the Web needs to a smaller list than
what we currently ship with. Ideally this becomes a fixed list across all
Web languages.
> If someone can provide a firm list of encodings that they are confident
> are required for a certain substantial percentage of the Web, I'm happy
> to add the list to the spec.
Can you not do a survey on your large dataset of data to find this out? I
read somewhere also that Adam Barth was able to add code to Google Chrome
to figure out a better algorithm for Content-Type sniffing. Maybe
something similar could be done here?
We've encountered problems by the way with using the Unicode encoding
matching algorithm. Particularly on some Asian sites. I think we need to
switch HTML5 back to something more akin to WebKit/Gecko/Trident. I
realize this means more magic lists, but the current algorithm does not
seem to cut it. E.g. sites rely on the fact that EUC_JP is not a
recognized encoding but EUC-JP is.
--
Anne van Kesteren
http://annevankesteren.nl/
More information about the whatwg
mailing list