[whatwg] Web Addresses vs Legacy Extended IRI (again)

Sun Mar 29 06:06:35 PDT 2009

On Sun, 29 Mar 2009 15:01:51 +0200, Giovanni Campagna  
<scampa.giovanni at gmail.com> wrote:
> 2009/3/29 Anne van Kesteren <annevk at opera.com>:
>> I'm not sure if you're correct about those differences, but even if you  
>> are they are not the only differences. E.g. LEIRIs perform  
>> normalization if the input encoding is non-Unicode. URLs do not. URLs  
>> can encode their query
>> component per the input encoding (and do so for HTML and some APIs).  
>> LEIRIs cannot.
>
> What is the problem with normalization? Is there a standard for
> conversion to non-Unicode to Unicode?
> I guess no, so normalization (which should always be done) is perfectly  
> legal.

It's about Unicode Normalization. (And it should not always be done.)

> In addition, IRIs are defined as a sequence of Unicode codepoints. It
> does not matter how those codepoints are stored (ASCII, ISO-8859-1,
> UTF-8), only the Unicode version of them.

Please read the IRI specification again. Specifically section 3.1.

> This is the same as URL5s, by the way, because none of them is defined
> on octets and both use the RFC3986 method for percent-encoding (using
> UTF-8)

No, it's not always using UTF-8.

-- 
Anne van Kesteren
http://annevankesteren.nl/