[whatwg] Security restriction allows content thievery

Ryosuke Niwa rniwa at webkit.org
Sun Jul 15 16:20:06 PDT 2012


On Sun, Jul 15, 2012 at 4:02 PM, Robert Eisele <robert at xarg.org> wrote:

> 2012/7/16 Tab Atkins Jr. <jackalmage at gmail.com>
> > On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele <robert at xarg.org> wrote:
> > > Browsers are very restrictive when one tries to access the contents of
> > > different domains (including the scheme), embedded via framesets. This
> is
> > > normally a good practice, but I'd suggest to weaken this restriction
> for
> > > the data: URI schema.
> > >
> > > I'm currently building an analysis system like Google Analytics, which
> > gets
> > > embedded into a website via a small JavaScript snippet. When I analyzed
> > the
> > > data, I came across a very interesting trick because I got a lot of
> > > requests (with the data from location.href) where the entire website
> was
> > > embedded into a data:text/html URI - except that all ads of the page
> were
> > > replaced. Fortunately, my tracking code has been left without
> > > modifications.
> > >
> > > But the scary thing is that this way you can monetize foreign content
> by
> > > simply embedding it somewhere you can direct traffic to. That's pretty
> > > clever, because the original site owner doesn't notice this abuse due
> to
> > > the fact that top.location.href isn't readable. Or even worse, he would
> > > never notice it at all when he doesn't sniff the URI with JavaScript,
> > > because image files would have no referrer.
> > >
> > > My final approach to convict the abuser is based on the fact, that the
> > > JavaScript was dynamically loaded from my server and that I can write
> to
> > > location.href. So I added this piece of code:
> > >
> > > if (top.location.protocol === 'data:') {
> > >     top.location.href = 'http://example.com/trap/';
> > > }
> > >
> > > But even then the referrer will not be passed to the server. So my
> > proposal
> > > is that the data URI schema gets an exception on this security
> behavior.
> >
> > The problem you outline is not directly tied to the solution you
> > present.  You can scrape a site and display it as your own without any
> > fancy tricks, just by downloading all the resources and hosting them
> > yourself.  This merely consumes a little more bandwidth for the
> > attacker, since they're hosting the images/etc themselves.
> >
>
> But you would get a valid referrer if the tracking code wasn't removed. The
> data: protects the abuser in an unecessary way. But you're absolutely right
> that the solution I present isn't entirly tied to the problem.
>

The embedder can easily remove the tracking code. Better yet, the embedder
can host the content on his server and disallow access to all external
resources to cripple your tracking code.

> The correct solution to this kind of problem is legal - this is simple
> > copyright violation.
>
> But if you don't have a chance to get information about the attacker, you
> can't sue him. I had the strange idea to use a prompt to ask the user for
> the original URL in his address bar. But as I said, that's strange.
>

That sounds like a problem we can't solve.

- Ryosuke



More information about the whatwg mailing list