[whatwg] Spellchecking proposal #2

Sat Jun 24 07:02:45 PDT 2006

L. David Baron wrote:
> The problem is that heuristics are only heuristics when they operate 
> on input written without knowledge of the heuristics.  When the input 
> was written with knowledge of the heuristics, they become de facto 
> standards.
> 
> Authors will learn what triggers spellchecking (or not) in Mozilla, 
> and write whatever markup, however inappropriate, gives the choice of 
> spellchecking that they want.  Then other browsers will be forced to 
> copy whatever Mozilla did.

Theoretically, if the heuristics are written well enough, such that 
authors providing accurate information end up with the best usability by 
default, that shouldn't happen.  If the heuristics are so bad that 
authors are left with little choice but to lie to improve the usability, 
then, yes, we'd end up with exactly that problem.

However, in reality, I'd have to admit that such good heuristics are 
going to take a long time to research and develop well; and, especially 
in the early stages, probably won't be accurate enough for authors to 
rely on all the time.

> So if we're going to end up with a standard anyway, why not admit it 
> and figure out what it should be rather than ending up there 
> accidentally?

Yes, I'd rather come up with a less-harmful solution now, regardless of 
semantic purity, than to repeat the mistakes of the past again and 
ending up with a more harmful defacto standard.

The main problem with providing an explicit spell checking switch to the 
author is the potential for abuse.  History has shown that authors will 
attempt to disable anything they don't like for any reason whatsoever, 
regardless of the usability benefits such features provide for users. 
We've seen that already with all of the following:

* IE's smart tags: <meta name="MSSmartTagsPreventParsing" content="True">
* Google AutoLink (Some scripts were developed to workaround this)
* IE's image toolbar: <meta http-equiv="imagetoolbar" content="no"> and 
<img gallery="no">)
* AutoComplete (autocomplete="off")
* Context menus (JavaScripts intercepting right click)
* Showing link URLs in status bar (using window.status)
* Removing browser chrome (in popups)
* View Source (includes attempts to obfuscate source code with JS, 
disabling context menus, etc.)
* Disabling printing (Some JS, works in IE only)
* Disabling Save As..., (Some JS, works in IE only)
* Disabling caching
* And anything else they can get their grubby little hands on!

I could easily imagine authors wanting to disable spell checking simply 
because the squiggly red underlines clash with their site's colour scheme.

However, the proposed spellcheck attribute has one major advantage over 
all of those: it's being designed to allow the user to easily override 
it if they want to.  I'd expect the result of that to be that authors 
won't bother doing so, unless spell checking really isn't suitable for 
the expected input, and it's an edge case where browser heuristics 
typically guess wrongly.

I'd like to see some research done to find out exactly what kinds of 
input authors use <input type="text">, <texarea> and contenteditable 
for, beyond those already mentioned earlier in the thread.  I'd also 
like to see research into the <label>s, name="", id="" and other 
identifying information, commonly given to such fields, which can be 
used for developing heuristics.

Although accept="" is unlikely to be commonly used for textual input 
these days, it would be useful to see research into the kind of 
text-based content commonly entered (for which MIME types exist) that 
browsers could use to improve their spell checking logic (e.g. ignoring 
elements and attributes in textareas accepting text/html or XML).

-- 
Lachlan Hunt
http://lachy.id.au/