[whatwg] Security restriction allows content thievery

Robert Eisele robert at xarg.org
Sun Jul 15 16:02:40 PDT 2012

2012/7/16 Tab Atkins Jr. <jackalmage at gmail.com>

> On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele <robert at xarg.org> wrote:
> > Browsers are very restrictive when one tries to access the contents of
> > different domains (including the scheme), embedded via framesets. This is
> > normally a good practice, but I'd suggest to weaken this restriction for
> > the data: URI schema.
> >
> > I'm currently building an analysis system like Google Analytics, which
> gets
> > embedded into a website via a small JavaScript snippet. When I analyzed
> the
> > data, I came across a very interesting trick because I got a lot of
> > requests (with the data from location.href) where the entire website was
> > embedded into a data:text/html URI - except that all ads of the page were
> > replaced. Fortunately, my tracking code has been left without
> > modifications.
> >
> > But the scary thing is that this way you can monetize foreign content by
> > simply embedding it somewhere you can direct traffic to. That's pretty
> > clever, because the original site owner doesn't notice this abuse due to
> > the fact that top.location.href isn't readable. Or even worse, he would
> > never notice it at all when he doesn't sniff the URI with JavaScript,
> > because image files would have no referrer.
> >
> > My final approach to convict the abuser is based on the fact, that the
> > JavaScript was dynamically loaded from my server and that I can write to
> > location.href. So I added this piece of code:
> >
> > if (top.location.protocol === 'data:') {
> >     top.location.href = 'http://example.com/trap/';
> > }
> >
> > But even then the referrer will not be passed to the server. So my
> proposal
> > is that the data URI schema gets an exception on this security behavior.
> The problem you outline is not directly tied to the solution you
> present.  You can scrape a site and display it as your own without any
> fancy tricks, just by downloading all the resources and hosting them
> yourself.  This merely consumes a little more bandwidth for the
> attacker, since they're hosting the images/etc themselves.

But you would get a valid referrer if the tracking code wasn't removed. The
data: protects the abuser in an unecessary way. But you're absolutely right
that the solution I present isn't entirly tied to the problem.

> The correct solution to this kind of problem is legal - this is simple
> copyright violation.

But if you don't have a chance to get information about the attacker, you
can't sue him. I had the strange idea to use a prompt to ask the user for
the original URL in his address bar. But as I said, that's strange.

> I'm not sure about the merits of your suggestion otherwise.  It's
> reasonable to make data: pages same-origin with their parent when
> they're contained within something, but it seems dodgy to make them
> same-origin with their *contained* pages as well.  If not done
> carefully, that could allow contained pages access to the data: page's
> parent as well, or other cross-origin pages that the data: page is
> containing.

Very intuitive thought, one could assume that data: pages are same-origin,
or better that embedded data: pages are part of the current page. In this
way, you wouldn't have the chance to get off the sandbox and access the
parent. What would be a situation where a same-origin could be dangerous?

> ~TJ

More information about the whatwg mailing list