[whatwg] IRIs vs. URIs

Bjoern Hoehrmann derhoermi at gmx.net
Wed Mar 14 12:16:47 PDT 2007


* L. David Baron wrote:
>If we say they're IRIs then the encoding step is always encoding to
>UTF-8.  But the traditional behavior for URIs has been to encode
>based on the encoding of the document, which requires tracking, for
>every URI, what the encoding of the document, style sheet, or script
>that contained it was.  (I don't think Mozilla does this for
>scripts, but we do for style sheets and documents.)

The traditional behavior of Internet Explorer on Western versions of
Windows for western web sites using western encodings has been, since
the release of IE5b2 I think, to encode the path using UTF-8 and the
query string using the document encoding, depending on the send-urls-
as-utf-8 setting. For example,

  <a href='Björn.html'>...</a>

in an ISO-8859-1 encoded document would result in a request for

  ... Bj%C3%B6rn.html ...

Opera, at least for a considerable amount of time, used UTF-8 for
the whole reference, I think independently of encodings, domains,
and other environment variables. Mozilla was incompatible with that
for a long time, always using the document encoding even for the
path. I understand this is no longer the case for trunk builds. So
the above is a major oversimplification of the issue.

(I included the many "Western" qualifications because the default
settings for send-urls-as-utf-8 is regionally different and because
I encountered some strange edge cases that result in odd behavior,
like sending raw non-ascii octets in the request; details are in
Mozilla Bugzilla comment from me, and various list archives).
-- 
Björn Höhrmann · mailto:bjoern at hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



More information about the whatwg mailing list