[whatwg] Offline Web Apps

Tue Sep 25 00:27:38 PDT 2007

On Sep 24, 2007, at 10:45 PM, Robert O'Callahan wrote:

> On 9/23/07, Maciej Stachowiak <mjs at apple.com> wrote:
> Obviously, if the way to get the contents as text requires providing
> the encoding, then it has to be a method. My comment was about the no-
> argument methods. But you have a point that reading from disk is not a
> simple get operation. Probably the methods should have names based on
> read or the like (read(), readAsText(), etc) to indicate this. Also,
> they should arguably be asynchronous since reading from the disk can
> be slow, especially for large files, and it is undesirable to block
> the main thread.
>
> For small files, synchronous reading is OK. Perhaps there should be  
> a separate whiz-bang asynchronous API ... it could support partial  
> reads too.

What kind of file is small enough is a matter of judgment and depends  
on device performance characteristics. I tried the following  
experiment to estimate how much time could be taken by synchronous  
cold reads of a moderate number of files (assuming multi-file support  
in <input type="file"> and naiive use of the synchronous read API):

$ time cat ~/Pictures/*.jpg > /dev/null

real	0m1.135s
user	0m0.007s
sys	0m0.076s

This is on a pretty fast machine with a local filesystem. I have  
76 .jpg files totaling about 19M in size. 1.13 seconds seems like an  
unacceptable length of time to block the UI, and it could easily be  
much worse for, say, a batch photo upload or an upload of a moderately  
large video file.

So I suspect that, much like synchronous XMLHttpRequest, synchronous  
file reads will lead to excessive UI lockups in bad circumstances  
unanticipated by the app author.

> Also, I'm not sure how a web app can be expected to know the encoding
> of a text file on disk.
>
> The same way that any other app does --- guess based on the  
> extension and expected usage? --- now that we've all standardized on  
> meta-data-less file systems :-(. I suppose an app could examine the  
> first chunk of the file and then re-read the file with a better guess.

The OS and the UA can often make a better guess, so I think the option  
to let the UA decide the encoding should at least be provided. Here  
are some sources of info that the UA has but the web app doesn't (at  
least without doing a separate binary read of the file first and  
possibly significant computation):

1) OS-level metadata, as for example in Mac OS X:
$ xattr -l plan.txt
com.apple.TextEncoding: UTF-8;134217984

2) Checking for a BOM.

3) Heuristics for specific file types, like looking for <meta charset>  
in HTML files or the encoding pseudo-attribute in an XML declaration.

4) General character set autodetection algorithms through statistical  
methods or similar.

5) Knowledge of the user's locale (useful for some legacy systems  
where default text encoding is determined by locale).

6) Knowledge of platform encoding conventions.

Regards,
Maciej

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20070925/471f0303/attachment-0001.htm>