[whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

Jukka K. Korpela jkorpela at cs.tut.fi
Tue Aug 20 06:49:16 PDT 2013


2013-08-20 2:40, Ryosuke Niwa wrote:

>> http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
>>
 >> Why is the maxlength attribute of the input element specified to
 >> restrict the length of the value by the code-unit length?

Apparently because in the DOM, "character" effectively means "code 
unit". In particular, the .value.length property gives the length in 
code units.

>> This is counter intuitive for users and authors who typically
>> intend to restrict the length by the number of composed character
>> sequences.

That is true. We should not expect end users to know whether a character 
they enter occupies one code unit or two, i.e. whether it is a BMP 
character or not. Then again, I don't expect most users to enter non-BMP 
characters, though this might be changing as e.g. emoticons become more 
popular.

>> In fact, this is the current shipping behavior of
>> Safari and Chrome.

And IE, but not Firefox. Here's a simple test:

<input maxlength=2 value="&#x10400;">

On Firefox, you cannot add a character to the value, since the length is 
already 2. On Chrome and IE, you can add even a second non-BMP 
character, even though the length then becomes 4. I don't see this as 
particularly logical, though I'm looking this from the programming point 
of view, not end user view.

>> Can the specification be changed to use the number of composed
>> character sequences instead of the code-unit length?

In contexts where you want to set maxlength in the first place, your 
reasons might well be related to limitations that apply to the code unit 
length. It's a different thing if the intent is to limit the amount of 
visible characters.

Interestingly, an attempt like
<input pattern=.{0,42}>
to limit the amount of *characters* to at most 42 seems to fail. 
(Browsers won't prevent from typing more, but the control starts 
matching the :invalid selector if you enter characters that correspond 
to more than 42 code units.) The reason is apparently that "." means 
"any character" in the sense "any code point", counting a non-BMP 
character as two.

> Also,
> http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute
> says "if the input element has a maximum allowed value length, then
> the code-unit length of the value of the element's value attribute
> must be equal to or less than the element's maximum allowed value
> length."
>
> This doesn't seem to match the behaviors of existing Web browsers or
> http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
> unless I'm misreading something.  Namely, the value attribute set in
> the markup or by script isn't automatically truncated at the
> element's maximum allowed value length.

There seems to be a conflict here indeed. It is different from the 
character vs. code unit issue, however.

Definitions in 4.10.21.1 clearly imply that the length of the value of a 
control may exceed the limit set by maxlength. The "Constraints" part 
deals with the question what happens then (in form submission).

Yucca



More information about the whatwg mailing list