[whatwg] Web Addresses vs Legacy Extended IRI (again)

Giovanni Campagna scampa.giovanni at gmail.com
Sun Mar 29 06:01:51 PDT 2009


2009/3/29 Anne van Kesteren <annevk at opera.com>:
> On Sun, 29 Mar 2009 14:37:19 +0200, Giovanni Campagna
> <scampa.giovanni at gmail.com> wrote:
>>
>> Summing up, the differences between URL5 and LEIRI are only about the
>> percent sign and its uses for delimiters.
>
> I'm not sure if you're correct about those differences, but even if you are
> they are not the only differences. E.g. LEIRIs perform normalization if the
> input encoding is non-Unicode. URLs do not. URLs can encode their query
> component per the input encoding (and do so for HTML and some APIs). LEIRIs
> cannot.

What is the problem with normalization? Is there a standard for
conversion to non-Unicode to Unicode?
I guess no, so normalization (which should always be done) is perfectly legal.

In addition, IRIs are defined as a sequence of Unicode codepoints. It
does not matter how those codepoints are stored (ASCII, ISO-8859-1,
UTF-8), only the Unicode version of them.
This is the same as URL5s, by the way, because none of them is defined
on octets and both use the RFC3986 method for percent-encoding (using
UTF-8)

> (Also, I'm not sure if the WHATWG list is the right place to discuss this as
> the editor of the new draft might not read this list at all.)
>

Unfortunately, I cannot join the public-html list. I could cross-post
this to www-html or www-archive but it would break the archives and
make it difficult to follow.

> --
> Anne van Kesteren
> http://annevankesteren.nl/
>

Giovanni



More information about the whatwg mailing list