[whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

Jukka K. Korpela jkorpela at cs.tut.fi
Tue Aug 20 07:21:35 PDT 2013

2013-08-20 17:09, Anne van Kesteren wrote:

> On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa <rniwa at apple.com> wrote:
>> Can the specification be changed to use the number of composed character sequences instead of the code-unit length?
> In a way I guess that's nice, but it also seems confusing that given
> data:text/html,<input type=text maxlength=1>
> pasting in U+0041 U+030A would give a string that's longer than 1 from
> JavaScript's perspective.

Oh, right, this is an issue different from the non-BMP issue I discussed 
in my reply. This is even clearer in my opinion, since U+0041 U+030A is 
clearly two Unicode characters, not one, even though it is expected to 
be rendered as “Å” and even though U+00C5 is canonically equivalent to 
U+0041 U+030A.

> I don't think there's any place in the
> platform where we measure string length other than by number of code
> units at the moment.

Besides, if “character” means something else than Unicode character 
(Unicode code point assigned to a character) or, as a different concept, 
Unicode code unit, then the question would arise what it means. For 
example, would a letter followed by 42 combining marks still be one 
character? (Such monstrosities are actually used, in an attempt to 
create “funny” effects.)


