[whatwg] Encodings and the web

Anne van Kesteren annevk at opera.com
Tue Dec 20 03:01:15 PST 2011


When doing research into encodings as implemented by popular user agents I
have found the current standards lacking. In particular:

    * More encodings in the registry than needed for the web
    * Error handling for encodings is undefined (can lead to XSS exploits,
      also gives interoperability problems)
    * Often encodings are implemented differently from the standard

A year ago I did some research into encodings[1] and more detailed for
single-octet encodings[2] and I have now taken that further into starting
to define a standard[3] for encodings as they are to be implemented by
user agents. The current scope is roughly defining the encodings, their
labels and name, and how you match a label.

The goal is to unify encoding handling across user agents for the web so
legacy pages can be interpreted "correctly" (i.e. as expected by users).

If you are interested in helping out testing (and reverse engineering)
multi-octet encodings please let me know. Any other input is much
appreciated as well.

(I emailed this separately to ietf-charsets.)

Kind regards,


Anne van Kesteren

More information about the whatwg mailing list