bzbarsky at MIT.EDU
Fri Sep 28 11:04:58 PDT 2012
On 9/28/12 1:30 PM, Anne van Kesteren wrote:
> Well that is interesting. So the document encoding is not solely a
> query component affair?
At least not for Gecko, no.
> cannot get this to work for data URLs.
document encoding is not UTF-8, it looks like Gecko will do the following:
1) Take the given string (which by this point is a byte array,
actually; if it started off as Unicode it got converted to UTF-8
to produce this byte array).
2) Unescape non-ascii escapes (that is, escapes whose hex value is not
in the ASCII range).
3) If the result is not valid UTF-8 bytes and the document encoding
is some variant of utf-16, or is utf-7, or is
x-imap4-modified-utf7 (whatever that is), just byte-inflate to
Unicode. There's a comment here about encodings that are not
4) Otherwise, if the byte array looks like valid UTF-8, convert
from UTF-8 to Unicode.
5) Otherwise, convert to Unicode using the document
4) Convert the resulting Unicode string to UTF-8.
5) Escape non-ASCII bytes.
I have no idea how much of this is needed in practice...
More information about the whatwg