[whatwg] Web Archives

Tyler Keating tylerkeating at mac.com
Wed Apr 11 15:59:41 PDT 2007


On 11-Apr-07, at 4:17 PM, Michael A. Puls II wrote:

> On 4/11/07, Tyler Keating <tylerkeating at mac.com> wrote:
>> Hi,
>> I apologize if I've missed this in the specification or mailing
>> archives, but I have a suggestion related to standardizing web
>> "archives" in HTML5.  Currently, I know that Firefox uses Mozilla
>> Archive Format (.maf), Internet Explorer and Opera use MIME HTML
>> (.mht)  and Safari uses its own format (.webarchive) for saving a web
>> page and all of its resources into a single file.  So clearly a
>> standard would be beneficial in ensuring "archive" compatibility
>> between browsers and I think it's suitable for that standard to
>> reside in HTML5.
>
> There's also the case of creating an .html file where all the
> resources are specified as data URIs.
>
> It's a really good way to archive, but IE won't handle it and most
> plug-ins don't accept data URIs, so there are problems with that
> use-case. (unless browsers can help with that in a secure way.)
>
> I made a suggestion about this on the Opera forums a while ago when
> Opera didn't even support .mht.
> <http://my.opera.com/community/forums/topic.dml?id=72718>
> (The actual working example links are broken, but the idea was..)
>
> In short, you have an index.ext along with all the files it needs. You
> (or the browser if you're saving the page) zip them up and change the
> extension to file.owp (was OperaWebPage archive at the time).
>
> The browser would read the zip file, extract it to a temp directory
> (or in memory or to the browser's cache etc.) and load the index file.
>
> The idea is really simple and this way, all the files stay in tact
> (unlike .mht which changes the markup).  However, the Mozilla Archive
> format already does this. It just uses index.rdf to specify what page
> to load instead of looking for index.ext.
>
> Not sure if HTML5 is the spot for this, but either way, it'd be neat
> to have a standard method of putting files in an archive where  the
> files are kept separate and unmodified. (I might want to create a
> HTML-based (with multiple web pages and pics etc.) FAQ archive, for
> example.)
>
> --  
> Michael

Yes, I think it is a simple idea and there are many uses for it since  
not every multimedia document needs to be "served" and passing  
directories around is not user-friendly.  My question to everyone is,  
does it belong in HTML5 and if not, then where does it belong?  How  
else to get multiple browsers and web-page editors to recognize such  
an archive?
- Tyler



More information about the whatwg mailing list