[whatwg] Spellchecking mark III
Mikko Rantalainen
mikko.rantalainen at peda.net
Fri Jun 30 03:23:49 PDT 2006
The more I think about this the more I believe that the correct
choise would be to describe the expected content more accurately.
The UA may then proceed to accurately turn spellchecking on or off.
The problem is that the lang attribute allows only stuff defined in
RFC 3066, which seems to support only ISO 639 defined language tags.
That is, the expressable languages are limited to *spoken* languages.
Ian Hickson wrote:
> On Sun, 11 Jun 2006, Alexey Feldgendler wrote:
>> > Information like "this input field should have autoindent" is
>> > presentational.
>
> Yeah, but you'd have to say "auto-indent this like C++", which isn't.
> IMHO.
Perhaps instead of using |spellcheck| attribute as a toggle, allow
white space separated list of expected input languages. If user is
expected to enter C++ code with English comments, then author should
use markup such as
<textarea lang="zzz" spellcheck="c++ en">
for "no linguistic content" with spell checking for c++ and English.
An another option would be to expand the lang attribute to allow
languages outside human languages. This has the added bonus that the
lang attribute could describe also other content more accurately.
RFC 3066 reserves language codes starting with "x-" for private use
and that could be used to aid spellchecking, too. Unfortunately only
A-Z,0-9 are allowed so perhaps something like
<textarea lang="x-cpp-en">
for private language cpp-en or "C++ with English comments". Or if
lang attribute is extended to allow multiple languages listed then
one could write
<textarea lang="en x-cpp">
for English text mixed with C++ code (which is less accurate than
the x-cpp-en above).
The GMail "To:" input field could be expressed as
<textarea lang="x-mail-to">
and UAs that don't regognize language "x-mail-to" should turn off
the spellchecking.
A typical blog input field could be encoded as
<textarea lang="x-html-fragment-en">
Here one sees more need for multiple language tags inside the "lang"
attribute. It would make more sense to use lang="x-html-fragment en"
or there would be need for *very* many private languages starting
with "x-html-fragment-" including "x-html-fragment-sv-fi".
> On Fri, 23 Jun 2006, Sander Tekelenburg wrote:
>> [AUTHOR REQUIREMENTS]
>>
>>> Authors should set the document's language information, to enable user
>>> agents to accurately determine which dictionary to use when checking
>>> the spelling or grammar of user input.
>> IMO this "should" should be a "must".
>
> What about if the author doesn't know the language?
ISO 639 Part 2 includes "und" for "undetermined language". A sane
default for UA is to disable the spell checking. Or use some unknown
heuristic to define the language itself.
> On Sat, 24 Jun 2006, Alexey Feldgendler wrote:
>> Even worse: when entering text in textarea, the user actually has a
>> choice which language to write in. I think the user agent should
>> provide, besides just the control to turn spellchecking on and off, a
>> choice of languages.
>
> Agreed.
If a form expects some English text to be entered, it would be wise
to mark text written with any other language as incorrectly spelled.
If author expects any language then he should specify lang="mul" for
"multiple languages" (again, defined by ISO 639 part 2).
Again, a list of acceptable languages would be nice here.
--
Mikko
More information about the whatwg
mailing list