[whatwg] Browser Bundled Javascript Repository

Mon Jun 15 10:55:31 PDT 2009

Hey Guys,

This is my first time on the list, I searched the Archives but I  
didn't see anything like this so I apologize if I missed any earlier  
discussion on something like this.

A while back I came across this two paragraph blog post titled  
"Browsers Should Bundle JS Libraries:"
http://fukamachi.org/wp/2009/03/30/browsers-should-bundle-js-libraries/

The premise is basically that browsers are repeatedly downloading the  
same javascript frameworks from different domains over and over every  
day.  In the author's own words:
"All popular, stable Javascript libraries, all open source. All  
downloaded tens of millions of times a day, identical code each time."

Below is a summary and expansion of my comments/ideas from the  
discussion on the above blog article.

A typical solution to the problem, and one that works right now in  
browsers, is that if you require a javascript library on your website  
you can point to a "publicly available" version of that library.  If  
enough sites use this public URI then the browser will continually be  
using that URI and it will be cached and reused by the browser.  This  
is the idea behind Google's Hosted Libraries:
http://code.google.com/apis/ajaxlibs/

There are some arguments against using Google's Hosted Libraries:
http://www.derekallard.com/blog/post/jquery-hosted-on-google-and-some-implications-for-developers/

However, I think the author makes a good point. Bundling the JS  
Libraries in the Browser seems like it would require very little  
space, could even be stored in a more efficient representation  
(compiled bytecode for example), and would prevent an extra HTTP  
Request.  The problem then becomes how does a browser know  
example.com's jquery.js is the same as other.com's jquery.js.  The  
developer should opt-in to telling the browser it wants to use a  
certain JS Library version that the browser may already know about.

The way I thought about it was by adding an attribute to the <script>  
tag.  In my comments, I used the "rel" attribute because of  
developer's familiarity with it in other tags, but it could (and  
probably should) be an entirely new attribute.  The value inside of  
this attribute would need to be a unique identifier for a possible  
script available in the browser's repository.  The "src" attribute  
should still point to a hosted version of the script in case this  
attribute is unsupported (ignored) or the script is not found in the  
repository (not-bundled).

For Example:

     <!-- SHA1 hash as identifier for jquery-1.2.3 -->
     <script rel="A56F2CED6..." src="..." />

     <!-- Canonical name as an identifier for a JS lib and version -->
     <script rel="jquery-1.2.3" src="..." />

Here the "rel" attribute's value is a standard identifier for a  
particular version of the JQuery JS Library.  The browser could check  
its Repository to see if it has it.  If found, no request is needed  
and it can load its local version.  If not found it can proceed like  
normal using the "src" attribute to download the script.

----

Pros:

- Future-Proof: Adding a new attribute, or using a currently ignored  
attribute, on the script tag would make this a safe addition that  
works fine in older browsers (backwards compatible) and works  
instantly in supported browsers.
- Developer Opt-In: Developers that choose not to use this feature  
could just ignore it.
- Pre-Compiled: By bundling known JS Libraries with the browser, the  
browser could store a more efficient representation of the file.  For  
instance pre-compiled into Bytecode or something else browser specific.
- Less HTTP Requests / Cache Checks: If a library is in the repository  
no request is needed. Cache checks don't need to be performed.  Also,  
for the 100 sites you visit that all send you the equivalent jquery.js  
you now would send 0 requests.  I think this would be enticing to  
mobile browsers which would benefit from this Space vs. Time tradeoff.
- No 3rd Party is Gathering Statistics: One of the arguments against  
using Google's Hosted Libraries is that you send them some data if you  
are indeed using their scripts and a client downloads from them  
(Referrer, etc.).  Here there is no 3rd party, its just between the  
client browser and domain.
- Standardizing Identifier For Libraries: Providing a common  
identifier for libraries would be open for discussion.  The best idea  
I've had would be to provide the SHA1 Hash of the Desired Release of a  
Javascript Library.  This would ensure a common identifier for the  
same source file across browsers that support the feature. This would  
be useful for developers as well.  A debug tool can indicate to a  
developer that the script they are using is available in the Browser  
Repository with a certain identifier.
- Repository Can Grow Dynamically - Assuming this is a desirable  
feature that shows some promise, the browser repository can grow  
dynamically.  Browsers can count the number of times they have seen  
equivalent source files (SHA1 hash values), or seen identifiers they  
didn't have in their repository and can grow (or shrink) accordingly.   
Likewise official sources can distribute new script/identifiers like  
browsers currently distribute lists of unsafe websites.  This may not  
even need to be the browser's responsibility, in the original article  
I envisioned a Firefox extension, with access to the repository via an  
API, that would handle such dynamic updates.

Cons:

- May Not Grow Fast Enough: If JS Libraries change too quickly the  
repository won't get used enough.
- May Not Scale: Are there too many JS Libraries, versions, etc making  
this unrealistic?  Would storage become too large?

Verifying any Realistic Improvements:

- Implement a Repository.
- Fill the repository with the popular JS Libraries used in the top  
100 websites (subjective) or web applications.  Alter the static HTML  
to contain the standard identifiers on the script tags to take  
advantage of the repository.
- Run a benchmark of cold and hot (cached) loads of these 100 pages  
with and without the repository.
- Compare times, memory, requests, etc.

-----

I hope this is relevant to HTML5 and WHATWG.  Unfortunately, I don't  
have the experience or knowledge to know if such a repository would  
provide browsers with any noticeable performance improvements.  My  
hope is that it would.  Maybe someone on the list can offer their  
opinion on wether or not they think this would even be worth  
implementing.

Thank you for taking the time to read this!  I look forward to hearing  
feedback.
- Joe