[whatwg] Registering protocol handlers

Fri Apr 21 14:26:11 PDT 2006

On Fri, 21 Apr 2006, Christian Biesinger wrote:
> > 
> > Per the spec, the methods do not check the syntactic validity of their 
> > arguments except for two things: The URI not having a %s, and the 
> > scheme or content types being "privileged" (http:, text/html, etc).
> 
> That's really of no help at all, since at some point the validity must 
> be checked.

They can't be checked at the registration point, because the URI might 
become valid before it is used, and because the networking library might 
not be able to tell if the URI is valid without fetching it. (It's also 
not really clear where you draw the line of an "invalid" URI -- is 
http://192.0.2.812/ an invalid URI?)

> But I assume this means that syntactic invalidity means that the call to 
> register*Handler returns normally but does nothing.

The caller cannot determine that the URI is invalid from calling this API, 
correct. What the UA actually does is up to the UA.

> BTW... shouldn't sites have the possibility to unregister themselves? I 
> as a user would expect a site that has a "register me" button to also 
> have an "unregister me" button.

I would presume the UA would provide this option, not the page. 
(Similarly, you don't have a Web API to remove a search engine, only to 
add one.)

> Looks fine. I want to note that this prescribes using uppercase 
> characters for the escaping, which is more strict than RFC 2396. Are 
> there any browsers that currently use lowercase characters for escaping?

There are no browsers that do anything for this API so far. I specified 
uppercase so that servers would not be surprised one day if they hardcoded 
uppercase and later found a client using lowercase. Anything we can do to 
make the server side easier is a win.

> It turns out that I'd be happier if you moved the normative parts into 
> the section before and made this one informative :) Most of it is 
> informative, anyway.

All but one of the paragraphs has at least one normative conformance 
criteria, so most of it is not informative. I'm not sure how to address 
your concern here.

> Certainly not something that amounts to "now you can feel free to forget
> everything I said above". For a start, you could say something like:
> 
>   This specification does not require a particular UI, or how
>   a browser should select a handler for a content type or protocol.
>   Therefore, the exact handling of calls to these functions is not
>   specified. However, if the browser ends up using the handler
>   registered by the website, it must follow the rules described
>   above.
> 
> With that, you could probably strike the "This section does not define 
> how the pages registered by these methods are used." sentence too.

This seems equivalent to what is currently in the spec -- the spec already 
says that when the UA uses the given URI, it must do so in a particular 
way. I've added some minor text to reference that more explicitly in the 
paragraph you mention.

> There are, in my opinion, a lot of issues with registering a handler for 
> a content type, as opposed to a protocol.
> 
> Some of them being:
> - The request that led to this content may be not idempotent
> - The request may require POST data

Yes. I've added a paragraph saying that for non-GET requests you shouldn't 
use this API.

> - The request may have required certain cookies
> - The request may have required certain authentication headers
> - The request may only be possible from certain IP ranges

In other words, the content might be privileged -- in which case you 
definitely don't want to send it to a remote site!

> - Obviously also the leaking intranet URI issue you mention

Leaking intranet URIs is a lot less dangerous than leaking intranet 
_content_, though.

> - The browser already has partial content, or maybe the full content 
> (for a short file), by the time it sees the content type (e.g. network 
> I/O on a background thread, content dispatch on the main thread, and a 
> short file). It requires the browser to throw away the data it has, 
> which is sort of ugly, especially if it is impossible to get it again 
> (see earlier points)

No, it doesn't require anything. The spec doesn't say when you use this -- 
in particular, it doesn't say you should use these options for the result 
of non-GET or authenticated requests. It even says that maybe you 
_shouldn't_ use it for authenticated requests, and I've now added a 
sentence that says you musn't use it for non-GET resources.

> The browser can not know for all of them if that is the case. The user 
> may not know either. This leads to a kind of ugly situation that it may 
> be impossible for the user to actually view the content. The spec does 
> not address these issues at all.

It addresses a number of them in the security section.

> It turns out that most of these issues could be addressed by implementing the
> content handler by a file upload to the registered URI. What do you think
> about that?

It seems very dangerous.

I think if you want to upload a file to the remote site, you should have 
to explicitly do so. I don't think we should ever automatically prompt to 
do an upload of a particular file, because users will almost certainly 
click right through it, with potentially disastrous results.

The main use cases I see here are for feeds, and for those you definitely 
want to send the URI. Same with, e.g., an iCalendar feed. I could also see 
this be used for proprietary data formats, but for those I would much 
rather the data only be accessible to the remote host if either the data 
was already available (without authentication), or if the user explicitly 
uploaded it.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'