<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">
<br><div><div>On Sep 24, 2007, at 10:45 PM, Robert O'Callahan wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">On 9/23/07, <b class="gmail_sendername">Maciej Stachowiak</b> <<a href="mailto:mjs@apple.com">mjs@apple.com</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Obviously, if the way to get the contents as text requires providing<br>the encoding, then it has to be a method. My comment was about the no-<br>argument methods. But you have a point that reading from disk is not a<br>simple get operation. Probably the methods should have names based on <br>read or the like (read(), readAsText(), etc) to indicate this. Also,<br>they should arguably be asynchronous since reading from the disk can<br>be slow, especially for large files, and it is undesirable to block<br>the main thread. </blockquote><div><br>For small files, synchronous reading is OK. Perhaps there should be a separate whiz-bang asynchronous API ... it could support partial reads too.</div></div></blockquote><div><br class="webkit-block-placeholder"></div><div>What kind of file is small enough is a matter of judgment and depends on device performance characteristics. I tried the following experiment to estimate how much time could be taken by synchronous cold reads of a moderate number of files (assuming multi-file support in <input type="file"> and naiive use of the synchronous read API):</div><div><br class="webkit-block-placeholder"></div><div><div>$ time cat ~/Pictures/*.jpg > /dev/null</div><div><br class="webkit-block-placeholder"></div><div>real<span class="Apple-tab-span" style="white-space:pre"> </span>0m1.135s</div><div>user<span class="Apple-tab-span" style="white-space:pre"> </span>0m0.007s</div><div>sys<span class="Apple-tab-span" style="white-space:pre"> </span>0m0.076s</div><div><br class="webkit-block-placeholder"></div><div>This is on a pretty fast machine with a local filesystem. I have 76 .jpg files totaling about 19M in size. 1.13 seconds seems like an unacceptable length of time to block the UI, and it could easily be much worse for, say, a batch photo upload or an upload of a moderately large video file.</div><div><br class="webkit-block-placeholder"></div><div>So I suspect that, much like synchronous XMLHttpRequest, synchronous file reads will lead to excessive UI lockups in bad circumstances unanticipated by the app author.</div></div><div><br></div><blockquote type="cite"><div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Also, I'm not sure how a web app can be expected to know the encoding<br>of a text file on disk.</blockquote><div><br>The same way that any other app does --- guess based on the extension and expected usage? --- now that we've all standardized on meta-data-less file systems :-(. I suppose an app could examine the first chunk of the file and then re-read the file with a better guess. <br></div></div></blockquote></div><br><div>The OS and the UA can often make a better guess, so I think the option to let the UA decide the encoding should at least be provided. Here are some sources of info that the UA has but the web app doesn't (at least without doing a separate binary read of the file first and possibly significant computation):</div><div><br class="webkit-block-placeholder"></div><div>1) OS-level metadata, as for example in Mac OS X:</div><div>$ xattr -l plan.txt </div><div><div>com.apple.TextEncoding: UTF-8;134217984</div><div><br class="webkit-block-placeholder"></div><div>2) Checking for a BOM.</div><div><br class="webkit-block-placeholder"></div><div>3) Heuristics for specific file types, like looking for <meta charset> in HTML files or the encoding pseudo-attribute in an XML declaration.</div><div><br class="webkit-block-placeholder"></div><div>4) General character set autodetection algorithms through statistical methods or similar.</div><div><br class="webkit-block-placeholder"></div><div>5) Knowledge of the user's locale (useful for some legacy systems where default text encoding is determined by locale).</div><div><br class="webkit-block-placeholder"></div><div>6) Knowledge of platform encoding conventions.</div><div><br class="webkit-block-placeholder"></div><div>Regards,</div><div>Maciej</div><div><br class="webkit-block-placeholder"></div></div></body></html>