[whatwg] Spec comments, sections 1-2
ian at hixie.ch
Thu Aug 13 16:59:22 PDT 2009
On Wed, 5 Aug 2009, Anne van Kesteren wrote:
> On Wed, 05 Aug 2009 02:01:59 +0200, Ian Hickson <ian at hixie.ch> wrote:
> > I'm pretty sure that character encoding support in browsers is more of
> > a "collect them all" kind of thing than really based on content that
> > requires it, to be honest.
> Really? I think a lot of them are actually used.
I'm pretty sure not all of them are common.
> If you know anything I'd love to trim the amount of encodings the Web
> needs to a smaller list than what we currently ship with. Ideally this
> becomes a fixed list across all Web languages.
That would be nice.
> > If someone can provide a firm list of encodings that they are
> > confident are required for a certain substantial percentage of the
> > Web, I'm happy to add the list to the spec.
> Can you not do a survey on your large dataset of data to find this out?
> I read somewhere also that Adam Barth was able to add code to Google
> Chrome to figure out a better algorithm for Content-Type sniffing. Maybe
> something similar could be done here?
For various reasons, my usual techniques for obtaining data aren't
suitable for encoding-related work. Could MAMA or Opera be instrumented
> We've encountered problems by the way with using the Unicode encoding
> matching algorithm. Particularly on some Asian sites. I think we need to
> switch HTML5 back to something more akin to WebKit/Gecko/Trident. I
> realize this means more magic lists, but the current algorithm does not
> seem to cut it. E.g. sites rely on the fact that EUC_JP is not a
> recognized encoding but EUC-JP is.
If you let me know what the algorithm should be, I can do that. Is it just
underscores that must not be ignored? Maybe we can just do a delta spec on
the Unicode algorithm? (i.e. say "do what Unicode says except...").
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg