[whatwg] [URL] Starting work on a URL spec
Boris Zbarsky
bzbarsky at MIT.EDU
Fri Jul 23 23:02:39 PDT 2010
On 7/24/10 1:50 AM, Brett Zamir wrote:
>> I would be particularly interested in data on this last, across
>> different browsers, operating systems, and locales... There seem to be
>> servers out there expecting their URIs in UTF-8 and others expecting
>> them in ISO-8859-1, and it's not clear to me how to make things work
>> with them all.
>
> Seems to me that if they are not in UTF-8, they should be treated as
> bugs, even if that is not a de jure standard.
Treated as bugs by whom?
The scenario is that a user types some non-ASCII text in the url bar.
This needs to be url-encoded to actually go on the wire, which raises
the question of what encoding. If the user is using IRIs, the answer is
UTF-8. A number of servers barf if you do this, especially because some
server-side scripting languages (PHP, e.g., last I checked) default to
URI-unescaping via something other than UTF-8.
So some browser encode the non-query part of the URI as UTF-8 and the
query part as ... something (user's default filesystem encoding, say,
for lack of a better guess). Others always use UTF-8 (and end up with
some servers not usable). Others... I have no idea. That's why I want
data. ;) In particular, while the "just use UTF-8, and if the user
can't access the site sucks to be the user" approach has a certain
theoretical-purity appeal, it doesn't seem like something I want to do
to my friends and family (always a good criterion for things you'd like
to do to users).
-Boris
More information about the whatwg
mailing list