[whatwg] Save a web page

voracity subs at voracity.org
Fri Jul 2 06:57:39 PDT 2004


Lachlan Hunt wrote:

 >   Yes, but I'm more concerned that the author will be given control
 > over something that is entirely the responsibility of the user agent.

What specific 'something' is the responsibility of the user agent? _Why_ is it 
the responsibility of the user agent?

 >
 >>> Also, would modificatoins only affect the saved document, or would it
 >>> affect the document as they're viewing it?
 >>
 >>
 >> The document as they're viewing it.
 >
 >
 >   Ok, so in the example I gave earlier, after they saved the document,
 > they would only see that one paragrah that said "Sorry, you cannot save
 > this document" (assuming I didn't make any mistakes in the script)

Yep. Except if the author wrote that, it'd be technically incorrect, because the 
user can always save the _document_. The only thing the author could prevent is 
the saving of the document in it's current state. (And not even that, with a 
knowledgable user willing to break copyright law. That's not as bad as it sounds 
--- user agent style sheets that block ads very likely break copyright law.)

 >
 >> Note that my original thought ... was that the UA would serialise the
 >> DOM tree out to markup, AND it would store the value of all script
 >> variables...
 >
 >
 >   In what file format?  Would that be (X)HTML?

I'm not sure it matters, but xhtml would probably be easiest. Unless you meant a 
language other than (x)html? In which case, probably not. (I say probably, 
because maybe there's a case for saving in binary.)

 >  If so, how do you
 > intend it to store the state of all the script variables?

Speaking naively, it would produce a string of script that, when run, recreates 
the state of script variables (this is something I've done before in a couple of 
toy applications with javascript). Now, when the user goes to 'save state', the 
UA could do several things, depending on how robust the process should be.

One way is that it (the UA) could simply put the string of script containing the 
variables' state in a 'restore state' function that is tacked to the end of the 
file in its own <script> element. This function is then called from onload 
(added to the end of whatever listeners are set for onload, or simply at the end 
of the onload text string). Of course, if you do multiple saves, then more and 
more 'restore state' functions would be saved, blowing out file size. To prevent 
this, the 'restore state' function name would be a special name that the UA 
scans for when the document is loaded. It would remove the function definition 
(from the memory that stores the scripts) and the call from onload (in the DOM 
tree). The function name would obviously have to be chosen so that it is rarely 
--- preferably never --- used in other contexts. A guid-like string might do (if 
meaning is not an issue).

Now, there is still a problem for your earlier concerns about the author 
blocking saving. For example, the author could redefine the special 'restore 
state' function from the 'onload'. (Of course, knowledgeable users could just 
delete that from the source that they have on their disks, though that would 
probably interfere with copyright.) This could be protected against by 
disallowing dynamic redefinitions (i.e. within functions, events, etc.) of that 
one special 'restore state' function.

I haven't considered if there are other ways that the author might be able to 
prevent state saving in this example --- I wouldn't be surprised if there were.

If you object to using a special function for this, I might be able to appease 
you. Presumably you don't object to built-in functions. Think of the 'restore 
state' function as a built-in function, except that it gets dynamically 
generated by the system at each save and is only called internally.

(Incidentally, I had had ambitions to write it as an extension for moz. But I'm 
guessing it would be far from easy and --- usual story --- I don't have that 
kind of time.)

hmmm, this got a little off-topic for this list . . .

 >  Or would it
 > be some kind of binary representation that stored everything in memory,
 > including the DOM tree, script variables, etc. into a file stream?

This is certainly another possibility. Advantages might be smaller file size 
(which might be countered with compression in the text case), faster loading, 
etc. Maybe other advantages . . . I haven't really considered it.

The disadvantage would be loss of control for the user.

 >> However an onSave would mean the document author could optimise the
 >> save function to only save the required script variables,
 >
 >
 >   How exactly would this be possible?  Scripts don't have access to the
 > user's file system, and can't work with, nor have any control over a
 > file as it's being saved, so how do you expect a script to determine
 > which variable are seialized, and written to the file?

I think I can describe this best by (crude) example:


<html>
<head><script>
var a = "words only";
var b = 3;
var c = 5;
var d = b+c; ///Variable that doesn't need saving

window.onSave = function() {
	document.getElementById("saveEverythingHere").value
		= a+";"+b+";"+c;
}

function restoreState() {
	var seh = document.getElementById("saveEverythingHere");
	if (seh.value!="") {
		///If there's something to restore
		var list = seh.split(/;/);
		a = list[0];
		b = list[1];
		c = list[2];
		d = b+c;
	}
}

</script></head>
<body onload="restoreState();">
<h1>Mind-blowing application #1</h1>
<input type="hidden" id="saveEverythingHere" value="">
</body>
</html>

I've omitted error checking, etc. to keep things simple.


 >> ... so that file size doesn't blow out. The downside is that the author
 >> has to work out what to save and how.
 >
 >
 >   How would it possibly blow out?  It's a web page, so obviously it's
 > going to be quite small.  Most web pages, especially standards compliant
 > pages are no bigger than about 100k (including markup, css and script,
 > but excluding images), often much smaller, so what real benefit would
 > any optimization have?

Any situation in which a large data structure in memory can be recreated 
algorithmically. One example would be where you have a hash list in both 
directions. You'd only need to store the list in one direction (key => value), 
and then recreate it in the other (value => key).

A contrived example that involves the DOM tree might be a <select> list of 
square numbers given the domain of 1 to 100 in steps of .01. That'd produce a 
difficult to compress list of 10,000 numbers requiring, say, 10 characters each 
+ strlen(<option></option>)=17 giving a total of 27 for each number. This would 
add 270k to a file when it could just be recreated (quite quickly) on load each 
time.

Admittedly, those aren't very good examples, but I hope you can see how 
optimisation _might_ be important.



More information about the whatwg mailing list