[whatwg] IRIs vs. URIs

Wed Mar 14 07:20:44 PDT 2007

L. David Baron on 2007-03-13:

> I tend to think it would be good that new uses of URIs/IRIs document that 
> they are really IRIs and therefore this reverse-encoding behavior should 
> not be used, but instead encoding should be done as UTF-8.

You cannot have UTF-8 encoding just for the URIs/IRIs, and another encoding 
for the rest of the source text. To properly parse a URI/IRI in the source 
document, you must first convert the bytes in the resource into a stream of 
Unicode characters.

> (In Mozilla's codebase such distinctions are easy to implement since
> we have to pass along the encoding of the document every time we
> create a URI in order to get this backwards-compatible behavior.

Of course, you will need to take special care to handle query data that is 
stored as plain non-ASCII bytes in the source document, so you would 
still need to pass around that document encoding.

> It would probably be good if the spec documented how the encoding
> issues in URIs are actually handled.

Indeed. Considering the number of partly contradicting bug reports we have 
here at Opera on the issue, it would be nice to have it clearly spelled out, 
so that everyone is doing the same thing, and that we are doing what the 
user expects.

-- 
\\//
Peter, software engineer, Opera Software

  The opinions expressed are my own, and not those of my employer.
  Please reply only by follow-ups on the mailing list.