[whatwg] Web Addresses vs Legacy Extended IRI (again)

Giovanni Campagna scampa.giovanni at gmail.com
Mon Mar 30 07:27:06 PDT 2009

2009/3/29 Kristof Zelechovski <giecrilj at stegny.2a.pl>:
> It is not clear that the server will be able to correctly support various
> representations of characters in the path component, e.g. identify accented
> characters with their decompositions using combining diacritical marks.  The
> peculiarities can depend on the underlying file system conventions.
> Therefore, if all representations are considered equally appropriate,
> various resources may suddenly become unavailable, depending on the encoding
> decisions taken by the user agent.
> Chris

It is not clear to me that the server will be able to support the
composed form of à or ø. Where is specified the conversion from
ISO-8859-1 to UCS? Nowhere.
If a server knows it cannot deal with Unicode Normalization, it should
either use an encoding form of Unicode (utf-8, utf-16), implement a
technology that uses directly IRIs (because Normalization is
introduced only when converting to an URI) or generate IRIs with
binary path data in opaque form (ie percent-encoded)
By the way, the server should be able to deal with both composed and
decomposed forms of accented character (or use none of them), because
I may type the path directly in my address bar (do you know what IME I


More information about the whatwg mailing list