[whatwg] Fixing two security vulnerabilities in registerProtocolHandler

Fri Apr 6 14:35:22 PDT 2012

On Fri, 6 Apr 2012, Tyler Close wrote:
> On Mon, Apr 2, 2012 at 4:39 PM, Ian Hickson <ian at hixie.ch> wrote:
> > On Mon, 26 Sep 2011, Tyler Close wrote:
> >>
> >> I was recently experimenting with the registerProtocolHandler (RPH) 
> >> API and came across a couple of security gotchas that make it hard to 
> >> safely use the API. One of these is already known, but AFAICT, hasn't 
> >> been fixed yet. I haven't seen the other discussed yet.
> >>
> >> The Mozilla blog post that introduces the registerProtocolHandler API 
> >> makes use of window.parent.postMessage to send a response from the 
> >> RPH handler back to the client page.
> >
> > I presume it uses this in conjunction with an <a href=""> link with a 
> > target="" attribute to load the handler in an iframe.
> 
> The client page loads the handler page using an iframe or a 
> window.open(). Either can work.
> 
> >> In the example code, the targetOrigin for this postMessage invocation 
> >> is '*', while also noting that this is not secure. AFAICT, there is 
> >> no API that the intent handler can reliably use to determine the 
> >> correct targetOrigin for this postMessage invocation.
> >
> > How can the origin be anything other than the origin of the page that 
> > triggered the link?
> 
> Exactly, but we need a way for the handler page to find out what that 
> origin is.
> 
> A client page on origin A causes a navigation to a RPH URL (iframe or 
> window.open). The browser loads the user chosen RPH handler, which is 
> another web page from origin B. After the handler page loads, it wants 
> to send a return value back to the client page. How does the handler 
> page know the client page's origin is A? It needs to know this origin 
> string so that it can securely use postMessage to send the return value 
> back. AFAICT, there is no existing API in the browser that lets the 
> handler page determine the client page's origin.

Well if it's an iframe, the parent can't be anything but the original 
origin, as far as I can tell.

But in general, there's not expected to be any talking back. If you want 
something where the handler talks back to the page that provided the data, 
then you should use Web Intents. registerProtocolHandler() and 
registerContentHandler() are intended for things like mail clients 
(mailto:) or PDF viewers, which do not talk back. Indeed in the common use 
case, you just click the link and the entire browsing context gets 
replaced, so there's nothing to talk back _to_.

> Currently, the handler page can only specify "*" in the postMessage 
> invocation that sends the return value. If the client page is navigated 
> by an attacker, before the postMessage is done, the attacker can 
> intercept the return value. It's the same rationale used every time we 
> advise programmers against using '*' as the targetOrigin for a 
> postMessage() invocation.

That rationale only applies when you're going from window to window, not 
when you're going from iframe to parent.

> >> The second problem with RPH is that the handler page doesn't have a 
> >> way of reliably getting the URL of the content to be handled from the 
> >> browser. In order to work in offline scenarios, the RPH handler must 
> >> put the %s placeholder in the fragment of its handler's URL.
> >
> > It's not clear to me that it makes sense to have an offline protocol 
> > handler. What kind of protocol do you have in mind?
> 
> For example, consider an offline web mail program. I click on a mailto: 
> link and want to compose a message in my web mail editor, queuing it to 
> be sent next time I'm online.
> 
> RPH is a way for a web page to send data to a user determined 
> application. There will surely be many scenarios where offline 
> functionality is desirable.

For such an example, you can just use a fallback section in the appcache 
manifest. (Or a fragment identifier, indeed.)

> >> Unfortunately, this means that other content in the browser could 
> >> modify the content URL before the handler reads it.
> >
> > Well, any content can load any URL, so it doesn't matter whether the 
> > URL is in the fragment identifier or the path or anything else, 
> > surely.
> 
> It matters if the handler page assumes that the URL came from its parent 
> or opener. The parent and opener then engage in a postMessage 
> conversation where the parent knows it said one thing, but the handler 
> heard it saying something different, something chosen by the attacker.

Why would a mail client talk back to its opener?

> >> The intent handler sees a request coming from the victim page, but 
> >> with a content URL specified by the attacker. A related problem is 
> >> that the intent handler has no way to distinguish whether its URL was 
> >> loaded via the browser's RPH handling, or whether the client page 
> >> directly navigated to the intent handler's URL. Both of these 
> >> problems could be fixed by adding another readonly DOMString to the 
> >> API that contains the %s data for the RPH invocation.
> >
> > I don't understand why it matters how the URL was invoked.
> 
> If the URL was invoked via RPH, then the handler page knows that the 
> user selected it for this action. The handler page also knows that any 
> arguments in the handler's URL (not in the RPH URL), were set by the 
> handler's origin and were not tampered with by the client page.
> 
> For example, a web mail program might have two registered RPH handlers 
> for mailto: "https://example.org/?from=me@company&q=%s" and 
> "https://example.org/?from=me@personal&q=%s". The user has configured 
> their browser to send mailto links to their personal email editor. A 
> malicious client page could directly open the URL for the company email 
> editor. The web mail editor needs a way to detect when a client page is 
> trying to subvert the user's chosen preferences. So, an RPH handler 
> needs a way to know that it was loaded via the RPH dispatch. Once it 
> knows this, it can also trust that the arguments in the URL, such as 
> "from" in this case, were not tampered with by the client page.

I don't understand the attack scenario. Sure, a Web page can open another 
Web page with arbitrary arguments. Why does it matter here?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'