[whatwg] Allowing ">" in attribute values
skrol29forum+whatwg at gmail.com
Tue Jun 29 02:56:46 PDT 2010
> It seems like what you want here is for browsers to parse as they do now, but a particular
> subset of browser-accepted syntax to be enshrined so that when defining your restrictions
> over content you control you can just say "follow the spec" instead of "follow the spec and
> don't put '>' in attribute values", right?
That is not the idea. I'll try to explain deeper. The problem takes it source in XML:
XML attribute values allow any characters but "<", "&" and the string delimiter of the value which can be " or '.
- Why "<" is forbidden ? The response is: in order to make sure we met the beginning of an entity tag whenever we met this character. But we can notice that if the parsing is still possible if "<" is allowed in attribute values. For example: <entitiy1 att1="<ok>"> could be parsed without error. So if "<" is forbidden it's not in order to make the parsing possible, but in order to facilitate en secure the parsing process.
- "&" as a value is forbidden because it's the escaping character for special characters.
- The string delimiter is forbidden because it has to be escaped in order to ensure a correct parsing.
We can see here that forbidden characters in XML have been chosen in order to ensure a certain quality of parsing. Nevertheless, parsing an XML contents still oblige to parse all attributes when it met an entity. This could be avoid if ">" was forbidden in attribute values. Maybe browsers don't care about this because they want to parse all attributes all the way. But nowadays they are many other purposes which are not displaying and that are involved in parsing HTML content. But parsing could be faster and more secure for all purposes (I mean not only for browsers) if ">" is forbidden and to be replaced with ">".
This is more about XML, but what do we have with HTML ? Replacing ">" with ">" is already a good practice in XML and HTML. Some HTML attributes already forbid it (it is allowed in CDATA attributes, forbidden in %Text attributes). Since XML 2 has been stopped, I think it is an occasion for HTML to make the good practice replaced by a new restriction, and in the same time lighten parsing processes which are not browser related.
Why changing the HTML spec instead of adding a restriction when we want ">" to be forbidden ? Because I think we should all want ">" to be forbidden. It is already quite deprecated to use it directly in HTML attribute values. We can always use ">" instead of ">" as we already use "<" instead of "<".
I understand that browser developers are not feeling concerned by this because parsing is working well as is for them.
And I admit the problem I've explained more due to XML than HTML.
More information about the whatwg