[whatwg] input element's value should not be sanitized during parsing

Ian Hickson ian at hixie.ch
Tue Dec 28 23:46:33 PST 2010

On Mon, 20 Sep 2010, Mounir Lamouri wrote:
> With the current specification, these two elements will not have the
> same value:
> <input value="foo
bar" type='hidden'>
> <input type='hidden' value="foo

Yes they will. The attribute order has no effect. Elements are created 
by the parser with their attributes already set:

# When the steps below require the UA to create an element for a token in 
# a particular namespace, the UA must create a node implementing the interface 
# appropriate for the element type corresponding to the tag name of the 
# token in the given namespace (as given in the specification that defines 
# that element, e.g. for an a element in the HTML namespace, this 
# specification defines it to be the HTMLAnchorElement interface), with 
# the tag name being the name of that element, with the node being in the 
# given namespace, and with the attributes on the node being those given 
# in the given token.
 -- http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token

> Depending on how the attributes are read, value will be set before or
> after type, thus, changing the value sanitization algorithm.

No, the value sanitization algorithm is invoked separately after the 
element is first created:

# When an input element is first created, the element's rendering and 
# behavior must be set to the rendering and behavior defined for the type 
# attribute's state, and the value sanitization algorithm, if one is 
# defined for the type attribute's state, must be invoked.
 -- http://www.whatwg.org/specs/web-apps/current-work/complete.html#the-input-element

> The following change would fix that bug:
> - The specification should add that the value sanitization algorithm
> should not be used during parsing/as long as the element hasn't been
> created.

I don't understand how it could be run before the element has been 
created. It runs on the element! :-)

> OR
> - The specification should add in the set value content attribute
> paragraph that the value sanitization algorithm should not be run during
> parsing/if the element hasn't been created.

The set value content attribute paragraph doesn't apply until after the 
element has been created, with the attribute already set.

> The specifications already require that the value sanitization algorithm
> should be run when the element is "first created".
> So, with this change, the element's value will be un-sanitized during
> parsing and as soon as the parsing will be done, the element's value
> will be sanitized.

I don't really understand what that means.

> By the way, "first created" could probably be changed to a concept from 
> the specifications. We can guess what that means but there is no strong 
> notion behind this words AFAIK.

At some point the element is created. How is this ambiguous?

On Tue, 21 Sep 2010, James Graham wrote:
> The concept of "Creating an Element" already exists [1] and is atomic, 
> that is the element is created with all its attributes in a single 
> operation. Therefore it is not clear to me how attribute order can make 
> a difference per spec. Am I missing your point?
> [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#creating-and-inserting-elements


On Tue, 21 Sep 2010, Boris Zbarsky wrote:
> Where does it say that it's atomic?  I don't see that anywhere (and in 
> fact, the "create an element" code in the Gecko parser is most decidedly 
> non-atomic).  Now maybe the spec intends this to be an atomic operation; 
> if so it needs to say that.

The operation it describes is a single operation: create a node. It 
describes various constraints on that operation, one of which is that the 
node have the various tokenised attributes set. I don't understand how 
creating a node could be anything other than atomic -- either it exists or 
it does not.

On Tue, 21 Sep 2010, Boris Zbarsky wrote:
> That doesn't work if your parser and DOM aren't very very _very_ tightly 
> coupled, since there are no DOM APIs to "atomically" set a bunch of 
> attributes.

The HTML spec in general assumes that the implementation of the parser is 
the implementation of the DOM and that you wouldn't use the DOM Core API 
to implement the DOM or the parser.

> So yes, if the spec implies that this is what's supposed to happen here 
> then it needs to be _very_ explicit about that.

It's not clear to me how I can be more explicit. Could you elaborate on 
what you would like it to say?

On Tue, 21 Sep 2010, Jonas Sicking wrote:
> Also, it would mean that the following two pieces of code behaves differently:
> inp = document.createElement("input");
> inp.setAttribute("value", "foo\nbar");
> inp.setAttribute("type", "hidden");
> and
> inp = document.createElement("input");
> inp.setAttribute("type", "hidden");
> inp.setAttribute("value", "foo\nbar");
> This does not seem desirable.

I can't argue that it's desireable, but it's how the Web works, as I 
understand it.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list