[whatwg] Re: form charset
Peter Karlsson
peter at opera.com
Wed Apr 20 00:01:19 PDT 2005
Olav Junker Kjær on 2005-04-20:
> However, is it really the right thing to allow arbitrary encodings of GET
> queries in the first place? The official Right Way to encode URLs is to
> use Utf8, and it seems strange to allow a different encoding after the
> question mark.
Strange as it may seem, that's the way it is currently done. HTML 4.01 says
that the character encoding of any forms data should be the document character
encoding, unless there is an accept-charset attribute on the form stating
otherwise. This means that you do need to handle the part of the URL after
the first question mark differently from the the part before it (but then
again, you also do need to handle the domain name different from the path
component, so this segmentation isn't that unexpected).
This is usually not a problem until you find something like this embedded in
a search page (where "{chinese}" is the Chinese search text you just entered
in the search field):
<a href="/search?q={chinese}">Next ></a>
And yes, this very much does exist in the wild.
> Of course we cannot just mandate utf8 always, since there is the issue of
> backwards compatibility. If I'm not mistaken, browsers usually urlencode
> forms using the same charset as the page.
Correct.
> However, the only legal value in accept-charset should be utf8 when the
> method is GET.
UTF-8 and US-ASCII, probably.
--
\\//
Peter, software engineer, Opera Software
The opinions expressed are my own, and not those of my employer.
Please reply only by follow-ups on the mailing list.
More information about the whatwg
mailing list