[whatwg] finding a number...
Charles McCathieNevile
chaals at opera.com
Wed Dec 13 06:58:09 PST 2006
On Wed, 13 Dec 2006 19:43:10 +0530, Mikko Rantalainen
<mikko.rantalainen at peda.net> wrote:
> Charles McCathieNevile wrote:
>> On Wed, 13 Dec 2006 13:17:14 +0530, Henri Sivonen <hsivonen at iki.fi>
>> wrote:
>>> On Dec 13, 2006, at 08:32, Charles McCathieNevile wrote:
>>>> possible *and no simpler* - this is too simple. Maybe assuming you
>>>> can parse numbers out of text is just a dumb idea as a normative
>>>> part of a spec.
>>> The attributes always work for any language. For English, the
>>> textContent works as a *bonus*. It isn't that the spec fails to work
>>> for non-English. It is just that a particular *redundant* bonus
>>> feature doesn't work for non-English.
>> The problem with this is that it means copying code the natural way
>> doesn't work for some non-english speakers, and they have to read the
>> spec or guess why. [...]
>
> I think that "they have to read the spec" is a bonus, too.
Yeah, except it turns out to be wishful thinking of the kind WHATWG tries
strenuously to avoid :( And where the problem is that people who
habitually use conventions for numbers, it turns out that many of them
don't really read english documents or mailing lists either...
> Perhaps the parser could be specified as follows:
>
> regexp for "numeric value" is [0-9 ,.]
> scan the numeric value backwards from end
> first character matching regexp [,.] is the decimal separator
>
> This would correctly interpret numbers such as
>
> 1,251,152.124
> 634.46
> 453.436.346,235
This last is the important use case that the existing method fails.
> 23 236 435 123,121
>
> It would fail for numbers such as
>
> 1,234,456.789,012
> 1.234.456,789.012
>
> but that such formats used in any locale?
Not that I know of. Formats I know of use ".", "," or " " as seperators
for integer amounts, and "," or "." for decimal seperators. The only
seperators I know of inside the decimal part are "-", "e" and "E". I can
imagine someone using the notation for web content in a meter, but I am
not sure that it is likely.
Of course there are a handful of other types of numbers. One thing that is
helpful is that in hebrew and arabic, numbers are written LTR even though
the rest of the text isn't. I am not sure about other LTR languages -
apparently there are a couple of Indic ones. On the other hand, since I am
going to meet a handful of people this weekend who specialise in
publishing for the Indian government, in at least their 22
constitutionally official languages, I will try to remember to ask. One
thing that is unhelpful is that in some languages numbers are written
using ordinary letters. Although I suspect this use is very rare on the
web, as I believe it is pretty much archaic in the relevant languages.
This is, of course, going down the path of specifying internationalised
number picking - something that some people are ust dead against.
cheers
Chaals
--
Charles McCathieNevile, Opera Software: Standards Group
hablo español - je parle français - jeg lærer norsk
chaals at opera.com Try Opera 9 now! http://opera.com
More information about the whatwg
mailing list