[whatwg] Security restriction allows content thievery

Fred Andrews fredandw at live.com
Thu Sep 6 17:14:59 PDT 2012

> > I'm currently building an analysis system like Google Analytics, which 
> > gets embedded into a website via a small JavaScript snippet. When I 
> > analyzed the data, I came across a very interesting trick because I got 
> > a lot of requests (with the data from location.href) where the entire 
> > website was embedded into a data:text/html URI - except that all ads of 
> > the page were replaced. Fortunately, my tracking code has been left 
> > without modifications.
> Weird.

Perhaps the concern is that content has been copied into a data: URL in violation of copyrights and used to obtain Ad revenue. However the content could very well be used with permission.  Ads are dynamic and do change on otherwise static content pages.  Thus this could well be an honest use of technology. It would be interesting to know if the search engines actually look at content in data: URLs - if not then the 'copied' content would seem to bring little advantage.

Or perhaps the concern is just that it thwarts efforts to track the referer.
> > But the scary thing is that this way you can monetize foreign content by 
> > simply embedding it somewhere you can direct traffic to. That's pretty 
> > clever, because the original site owner doesn't notice this abuse due to 
> > the fact that top.location.href isn't readable. Or even worse, he would 
> > never notice it at all when he doesn't sniff the URI with JavaScript, 
> > because image files would have no referrer.
> > 
> > My final approach to convict the abuser is based on the fact, that the 
> > JavaScript was dynamically loaded from my server and that I can write to 
> > location.href. So I added this piece of code:
> > 
> > if (top.location.protocol === 'data:') {
> >     top.location.href = 'http://example.com/trap/';
> > }
> > 
> > But even then the referrer will not be passed to the server. So my 
> > proposal is that the data URI schema gets an exception on this security 
> > behavior.
> I don't understand. What referrer are you trying to set? To what?

I think the aim is to have the URL of the page that includes these data: URLs sent to the tracking server?

I can't see any technical issues raised here?

Some think trackers are 'scary' and consider user privacy and safety more important, and would prefer to not send a referer and to even have such  Javascript sandboxed so that it can't leak private information.



More information about the whatwg mailing list