[whatwg] URL decomposition on HTMLAnchorElement interface

Kartikaya Gupta lists.whatwg at stakface.com
Fri Mar 27 13:12:07 PDT 2009


On Fri, 27 Mar 2009 14:14:35 -0400, Boris Zbarsky <bzbarsky at MIT.EDU> wrote:
> 
> This case is more fun.  It's an unknown scheme, so it's assumed to be a 
> no-authority non-hierarchical scheme and the URI is parsed that way. 
> This does cause issues, since RFC 3986 says that i there is no authority 
> then the path cannot begin with two slashes (so if "scheme" is a 
> non-authority protocol then the URI is invalid, in fact).  But deciding 
> whether this is an invalid URI or not involves knowing something about 
> the "scheme" protocol, which is rather hard in this case, since you just 
> made it up.  ;)

For unknown schemes, if the authority starts with "//", doesn't it make sense to assume that the scheme allows an authority? I would assume that for an unknown scheme, the generic URI syntax in RFC3986 should be followed, which would interpret the stuff between "//" and the following "/" as the authority.

On Fri, 27 Mar 2009 10:49:41 -0700, Jonas Sicking <jonas at sicking.cc> wrote:
> What would you suggest should happen instead?
> 
> I don't see a reason why we wouldn't be ok with changing how firefox
> behaves here, but discussions about better ways of doing it are a lot
> more productive than discussions about how bad the current behavior
> is.
> 

Agreed. How about the following:

- Attempts to set "protocol" to null, the empty string, or anything containing invalid characters (i.e. not in the "scheme" production of RFC3986) should throw. Setting it to anything else should be allowed and should update the scheme component of the underlying URI.
- Attempts to set "host" to null for a scheme known to require an authority should throw. For all other schemes (i.e. ones that do not require an authority, or unknown schemes) setting "host" to null should remove the authority component of the underlying URI. For all schemes, setting the host to anything else should be allowed (invalid characters are escaped) and should update the authority component of the underlying URI.
- Attempts to set "hostname" should behave the same as setting "host", except that in cases where the authority is updated with a new value (this excludes the case where the authority is being removed), the old port (if any) should be preserved.
- Any attempt to set "port" when the "host" is null (i.e. there is no authority component in the underlying URI) should throw. If there is a non-null "host", then: (1) setting "port" to null should remove the port subcomponent from the underlying URI if there is one, (2) setting "port" to the empty string or invalid characters should throw, and (3) setting "port" to a valid port string should update the port subcomponent of the underlying URI.
- Attempts to set "pathname" to null should throw, since the path is a required component of a URI. Setting "pathname" to anything else should be allowed and should update the path component of the underlying URI (invalid characters are escaped).
- Attempts to set "search" to null should remove the query component from the underlying URI, setting it to anything else is allowed and should update the query component of the underlying URI (invalid characters are escaped).
- Attempts to set "hash" to null should remove the fragment component from the underlying URI, setting it to anything else is allowed and should update the fragment component of the underlying URI (invalid characters are escaped).
- In all cases, undefined should be treated as null. (i.e. [Undefined=Null, Null=Null] in WebIDL)

Notes:
- In general I made every invalid action throw rather than ignoring the attempt because I personally don't like it when things fail silently.
- I think that null should not be stringified to "null" because for some of the components setting to null makes sense, and I prefer it all components are consistent with respect to stringification.
- In cases where the scheme is unknown I think the behavior should be such that it follows the generic URI syntax in RFC3986 as much as possible. Specifically, if it doesn't recognize the scheme, it shouldn't arbitrarily disallow behavior like removing or adding an authority component.

Thoughts/comments?

Cheers,
kats


More information about the whatwg mailing list