[whatwg] URL decomposition on HTMLAnchorElement interface

Boris Zbarsky bzbarsky at MIT.EDU
Fri Mar 27 14:40:08 PDT 2009


Kartikaya Gupta wrote:
> For unknown schemes, if the authority starts with "//", doesn't it make sense to assume that the scheme allows an authority? I would assume that for an unknown scheme, the generic URI syntax in RFC3986 should be followed, which would interpret the stuff between "//" and the following "/" as the authority.
> 

This is an option, but it's not obviously correct, just as it's not 
obviously correct (and in fact would break pages) to parse 
"http:foo.com/" without an authority.

I'm reluctant to change any behavior here unless there's a spec, along 
with some data indicating the reasons for that spec and its impact on 
website compat.

> - Attempts to set "protocol" to null, the empty string, or anything containing invalid characters (i.e. not in the "scheme" production of RFC3986) should throw. Setting it to anything else should be allowed and should update the scheme component of the underlying URI.
> - Attempts to set "host" to null for a scheme known to require an authority should throw. For all other schemes (i.e. ones that do not require an authority, or unknown schemes) setting "host" to null should remove the authority component of the underlying URI. For all schemes, setting the host to anything else should be allowed (invalid characters are escaped) and should update the authority component of the underlying URI.
> - Attempts to set "hostname" should behave the same as setting "host", except that in cases where the authority is updated with a new value (this excludes the case where the authority is being removed), the old port (if any) should be preserved.
> - Any attempt to set "port" when the "host" is null (i.e. there is no authority component in the underlying URI) should throw. If there is a non-null "host", then: (1) setting "port" to null should remove the port subcomponent from the underlying URI if there is one, (2) setting "port" to the empty string or invalid characters should throw, and (3) setting "port" to a valid port string should update the port subcomponent of the underlying URI.
> - Attempts to set "pathname" to null should throw, since the path is a required component of a URI. Setting "pathname" to anything else should be allowed and should update the path component of the underlying URI (invalid characters are escaped).
> - Attempts to set "search" to null should remove the query component from the underlying URI, setting it to anything else is allowed and should update the query component of the underlying URI (invalid characters are escaped).
> - Attempts to set "hash" to null should remove the fragment component from the underlying URI, setting it to anything else is allowed and should update the fragment component of the underlying URI (invalid characters are escaped).
> - In all cases, undefined should be treated as null. (i.e. [Undefined=Null, Null=Null] in WebIDL)

These are all more or less unacceptable.  Foe example, setting 
"pathname" to empty string should work just fine, imo; setting that on 
"http://foo.com/bar/" should result in "http://foo.com/".

There are big scary comments in the Gecko code for these setters saying 
that they must never ever throw.  I suspect that making them throw would 
be a serious web compat issue.

Changing from an authority to a non-authority URI or the other way 
around doesn't seem desirable to me (and would only work for unknown 
schemes anyway, presumably, at best; it's better if it just never works).

> - In general I made every invalid action throw rather than ignoring the attempt because I personally don't like it when things fail silently.

That's nice, but I suspect web sites rely on the silent fail behavior here.

> - In cases where the scheme is unknown I think the behavior should be such that it follows the generic URI syntax in RFC3986 as much as possible. Specifically, if it doesn't recognize the scheme, it shouldn't arbitrarily disallow behavior like removing or adding an authority component.

Since for any given scheme the component is either allowed or not, it 
doesn't make sense to do that, to me...

-Boris


More information about the whatwg mailing list