[whatwg] New URL Standard
Ian Hickson
ian at hixie.ch
Mon Sep 24 21:18:03 PDT 2012
This is Anne's spec, so I'll let him give more canonical answers, but:
On Mon, 24 Sep 2012, David Sheets wrote:
>
> Your conforming WHATWG-URL syntax will have production rule alphabets
> which are supersets of the alphabets in RFC3986.
Not necessarily, but that's certainly possible. Personally I would
recommend that we not change the definition of what is conforming from the
current RFC3986/RFC3987 rules, except to the extent that the character
encoding affects it (as per the HTML standard today).
http://whatwg.org/html#valid-url
> This is what I propose you define and it does not necessarily have to be
> in BNF (though a production rule language of some sort probably isn't a
> bad idea).
We should definitely define what is a conforming URL, yes (either
directly, or by reference to the RFCs, as HTML does now). Whether prose or
a structured language is the better way to go depends on what the
conformance rules are -- HTML is a good example here: it has parts that
are defined in terms of prose (e.g. the HTML syntax as a whole), and other
parts that are defined in terms of BNF (e.g. constraints on the conetnts
of <script> elements in certain situations). It's up to Anne.
> Error recovery and extended syntax for conforming representations are
> orthogonal.
Indeed.
> How will WHATWG-URLs which use the syntax extended from RFC3986 map into
> RFC3986 URI references for systems that only support those?
The same way that those systems handle invalid URLs today, I would assume.
Do you have any concrete systems in mind here? It would be good to add
them to the list of systems that we test. (For what it's worth, in
practice, I've never found software that exactly followed RFC3986 and
also rejected any non-conforming strings. There are just too many invalid
URLs out there for that to be a viable implementation strategy.)
I remember when I was testing this years ago, when doing the first pass on
attempting to fix this, that I found that some less widely tested
software, e.g. wget(1), did not handle URLs in the same manner as more
widely tested software, e.g. IE, with the result being that Web pages were
not handled interoperably between these two software classes. This is the
kind of thing we want to stop, by providing a single way to parse all
input strings, valid or invalid, as URLs.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list