[whatwg] URL standard: Query string parsing; host parsing

poccil14 at gmail.com poccil14 at gmail.com
Wed Mar 20 16:52:13 PDT 2013


After rechecking the URI specification in RFC3986 I want to withdraw my question on query
string parsing.  Apparently I relied on the older RFC2396 syntax of URIs (through the java.net.URI
documentation) and naively assumed that the parsing of query strings in URIs remained unchanged.
Accordingly, the query string question is withdrawn.  However, I still have a question about
host parsing.  For convenience, I repeat that question here:

-- Host parsing and Unicode characters --

Rule 2 of the host parser says "Let host be the result of running utf-8's decoder on the percent decoding of input."  But the percent decoding algorithm only works on ASCII strings, and has undefined behavior on Unicode strings.  This may preclude the use of Unicode characters in host names, especially in IDNA, which probably isn't the intent.  Accordingly, should this rule and/or the percent decoding algorithm be redefined to allow Unicode characters here? (A related question is whether the URL standard should just go ahead and adopt Unicode Technical Standard 46 for IDNA, but that issue need not be answered now.)



More information about the whatwg mailing list