[whatwg] Web Encodings

Anne van Kesteren annevk at opera.com
Wed Aug 19 13:47:57 PDT 2009

Today every browser implements their own encoding label matching algorithm, supports their own list of encodings, their own list of encoding label aliases, and everything sort of works, but not really.

HTML5 solves part of this problem by defining exactly how to identify an encoding label alias in a text/html stream. It also defines which encoding label matching algorithm to use, UTS22, but we found out that this is incompatible with (existing) sites that specify EUC_JP at the HTTP level and actually want to be decoded per UTF-8 according to a <meta> in the text/html stream. This works fine if you have a strict encoding label matching algorithm, but with UTS22, EUC_JP and EUC-JP become the same thing, while only the latter is the actual encoding label.

Another problem HTML5 does not solve is giving a definitive list of encodings clients have to implement to be compatible with a large body of Web content. This means new clients will have to reverse engineer that list from existing clients which I think is bad.

Anne van Kesteren

More information about the whatwg mailing list