[whatwg] Web Addresses vs Legacy Extended IRI (again)
Anne van Kesteren
annevk at opera.com
Sun Mar 29 06:06:35 PDT 2009
On Sun, 29 Mar 2009 15:01:51 +0200, Giovanni Campagna
<scampa.giovanni at gmail.com> wrote:
> 2009/3/29 Anne van Kesteren <annevk at opera.com>:
>> I'm not sure if you're correct about those differences, but even if you
>> are they are not the only differences. E.g. LEIRIs perform
>> normalization if the input encoding is non-Unicode. URLs do not. URLs
>> can encode their query
>> component per the input encoding (and do so for HTML and some APIs).
>> LEIRIs cannot.
>
> What is the problem with normalization? Is there a standard for
> conversion to non-Unicode to Unicode?
> I guess no, so normalization (which should always be done) is perfectly
> legal.
It's about Unicode Normalization. (And it should not always be done.)
> In addition, IRIs are defined as a sequence of Unicode codepoints. It
> does not matter how those codepoints are stored (ASCII, ISO-8859-1,
> UTF-8), only the Unicode version of them.
Please read the IRI specification again. Specifically section 3.1.
> This is the same as URL5s, by the way, because none of them is defined
> on octets and both use the RFC3986 method for percent-encoding (using
> UTF-8)
No, it's not always using UTF-8.
--
Anne van Kesteren
http://annevankesteren.nl/
More information about the whatwg
mailing list