[whatwg] Caching of identical files from different URLs using checksums
julian.reschke at gmx.de
Fri Feb 17 10:05:17 PST 2012
On 2012-02-17 09:42, Sven Neuhaus wrote:
> Google's. The benefits are:
> the CDN and not from the site that uses them
> * If enough sites refer to the same external file, the browser will cache the file and even if
> There are however some drawbacks to this approach:
> * Security: The site operator is trusting an external site. If the CDN serves a malicious file
> it will directly lead to code execution in browsers under the domain settings of the site
> including it (a form of cross site scripting).
> * Availability: The site depends on the CDN to be available. If the CDN is down the site may not
> be available at all.
> * Privacy: The CDN will see requests for the file with HTTP referer headers for every visitor
> of the site.
> * Extra DNS lookup if file is not already cached
> * Extra HTTP connection (can't use persistent connection because it's a different site) if file is not cached
> I am proposing a solution that will solve all these problems, keep the benefits and offers
> some extra advantages:
> 1. The site stores a copy of the library file(s) on its own site.
> 2. The web page includes the library from the site itself instead of from the CDN
> 3. The script tag specifies a checksum calculated using a cryptographic hash function.
> With this solution, whenever a browser downloads a file and stores it in the local cache, it calculates
> its checksum. The browser can check its cache for an (identical) file with the same checksum
> (no matter what URL it was retrieved from) and use it instead of downloading the file again.
> This suggestion has previously been discussed here ( http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2006-November/thread.html#7825 ), however for a different purpose (file integrity instead of caching identical files from different sites) and I don't feel the points raised back then apply.
> If a library is popular, chances are that many sites are including the identical file and it will
> already be in the browser's cache. No network access is necessary to use it, improving the users'
> privacy. It doesn't matter if the sites store the library file at a different URL. It will always
> be identified by its checksum. The cached file can be used more often.
> The syntax used to specify the checksum is using the fragment identifier component of a URI
> (RFC 3986 section 3.5).
Stop here. That's not what the fragment identifier is for.
Instead, you could specify the hash as a separate attribute on the
Best regards, Julian
More information about the whatwg