[whatwg] Offline Web Apps
Maciej Stachowiak
mjs at apple.com
Tue Sep 25 00:27:38 PDT 2007
On Sep 24, 2007, at 10:45 PM, Robert O'Callahan wrote:
> On 9/23/07, Maciej Stachowiak <mjs at apple.com> wrote:
> Obviously, if the way to get the contents as text requires providing
> the encoding, then it has to be a method. My comment was about the no-
> argument methods. But you have a point that reading from disk is not a
> simple get operation. Probably the methods should have names based on
> read or the like (read(), readAsText(), etc) to indicate this. Also,
> they should arguably be asynchronous since reading from the disk can
> be slow, especially for large files, and it is undesirable to block
> the main thread.
>
> For small files, synchronous reading is OK. Perhaps there should be
> a separate whiz-bang asynchronous API ... it could support partial
> reads too.
What kind of file is small enough is a matter of judgment and depends
on device performance characteristics. I tried the following
experiment to estimate how much time could be taken by synchronous
cold reads of a moderate number of files (assuming multi-file support
in <input type="file"> and naiive use of the synchronous read API):
$ time cat ~/Pictures/*.jpg > /dev/null
real 0m1.135s
user 0m0.007s
sys 0m0.076s
This is on a pretty fast machine with a local filesystem. I have
76 .jpg files totaling about 19M in size. 1.13 seconds seems like an
unacceptable length of time to block the UI, and it could easily be
much worse for, say, a batch photo upload or an upload of a moderately
large video file.
So I suspect that, much like synchronous XMLHttpRequest, synchronous
file reads will lead to excessive UI lockups in bad circumstances
unanticipated by the app author.
> Also, I'm not sure how a web app can be expected to know the encoding
> of a text file on disk.
>
> The same way that any other app does --- guess based on the
> extension and expected usage? --- now that we've all standardized on
> meta-data-less file systems :-(. I suppose an app could examine the
> first chunk of the file and then re-read the file with a better guess.
The OS and the UA can often make a better guess, so I think the option
to let the UA decide the encoding should at least be provided. Here
are some sources of info that the UA has but the web app doesn't (at
least without doing a separate binary read of the file first and
possibly significant computation):
1) OS-level metadata, as for example in Mac OS X:
$ xattr -l plan.txt
com.apple.TextEncoding: UTF-8;134217984
2) Checking for a BOM.
3) Heuristics for specific file types, like looking for <meta charset>
in HTML files or the encoding pseudo-attribute in an XML declaration.
4) General character set autodetection algorithms through statistical
methods or similar.
5) Knowledge of the user's locale (useful for some legacy systems
where default text encoding is determined by locale).
6) Knowledge of platform encoding conventions.
Regards,
Maciej
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20070925/471f0303/attachment.htm>
More information about the whatwg
mailing list