[whatwg] New URL Standard

David Sheets kosmo.zb at gmail.com
Tue Sep 25 11:20:15 PDT 2012


On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren <annevk at annevk.nl> wrote:
> On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson <ian at hixie.ch> wrote:
>> Not necessarily, but that's certainly possible. Personally I would
>> recommend that we not change the definition of what is conforming from the
>> current RFC3986/RFC3987 rules, except to the extent that the character
>> encoding affects it (as per the HTML standard today).
>>
>>    http://whatwg.org/html#valid-url
>
> FWIW, given that browsers happily do requests to servers with
> characters in the URL that are "invalid" per the RFC (they are not URL
> escaped) and servers handle them fine I think we should make the
> syntax more lenient. E.g. allowing [ and ] in the path and query
> component is fine I think.

I believe this would introduce ambiguity for parsing URI references.
Is "[::1]" an authority reference or a path segment reference?

> As for the question about why not build this on top of RFC 3986. That
> does not handle non-ASCII code points. RFC 3987 does, but is not a
> suitable start either. As shown in http://url.spec.whatwg.org/ it is
> quite trivial to combine parsing, resolving, and canonicalizing into a
> single algorithm (and deal with URI/IRI, now URL, as one).

Composition is often trivial but unenlightening. There is necessarily
less information in a partially evaluated function composition than in
the functions in isolation.

Defining a formal language accurately and in a broadly understandable
manner is nontrivial. Your task is nontrivial.

> Trying to
> somehow patch the language in RFC 3987 to deal with the encoding
> problems for the query component, to deal with parsing
> http:example.org when there is a base URL with the same scheme versus
> when there isn't, etc. is way more of a hassle I think, though I am
> happy to be proven wrong.

I believe the encoding problems are handled by a normalization
algorithm and parsing relative references is handled by the base
scheme module.

What is the acceptable trade-off between (y)our hassle and the time of
technologists in the coming decades? Will you make it easier or harder
for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)?

> --
> http://annevankesteren.nl/



More information about the whatwg mailing list