[whatwg] [wf2] Note about XML attribute value handling
mikko.rantalainen at peda.net
Tue May 10 22:48:32 PDT 2005
Anne van Kesteren wrote:
> Mikko Rantalainen wrote:
>><input type="hidden"> works just fine with existing UAs even when one
>>uses application/xhtml+xml provided that all meaningful whitespace has
>>been converted to entities.
for CR and so on. PHP,
>>for example, provides function htmlentities() exactly for this purpose.
> Really? Why would 'htmlentities' be useful in an XML environment? Also,
> getting HTML *entities* in your XML document doesn't seem like a good thing.
Yes, you're right that in some cases that results in problems. That's
because htmlentities() returns entity references like """; if it
*only* returned numeric character references like """ it would work
just fine despite the fact that it's designed for HTML. Those numbers
refer to UNICODE character points. Als note that XML normalization rules
state that "For a character reference, append the referenced character
to the normalized value" but "For an entity reference, recursively apply
step 3 of this algorithm to the replacement text of the entity." 
Real world user agent behavior might differ and that's what I'm really
That said, I'm using my own function to correctly encode special
characters to numerical character references, but the point remains the
same. It doesn't matter if you put stuff in the contents of an element
or inside an attribute value; you *have to* encode the string to hide
special characters. If you put the string inside an attribute, the
encoding must also include all whitespace and quotation characters in
addition to characters like "<", "&" and ">".
You cannot put a random string between <textarea> and </textarea> tags
either and expect to get a valid XML fragment as a result.
More information about the whatwg