[whatwg] Make quoted attributes a conformance criteria

Ian Hickson ian at hixie.ch
Sun Aug 2 19:45:45 PDT 2009


On Thu, 23 Jul 2009, Keryx Web wrote:
> 
> I'd say it is safe to say that using quotation marks for attribute 
> values, always, except perhaps for collapsed, boolean attributes, has 
> been regarded as best practice for a long time now. Speaking as an 
> instructor for newbies, enforcing quotation marks has proven its value 
> countless of times for me and my students. I'd say that all of my 
> colleagues in WaSP EduTF would agree on that. [...]

For the WHATWG, it doesn't matter how many people agree on something, as 
we base the spec's text on reasoned debate and research. :-)


> With this in mind I suggest that the spec would be improved in the 
> (below) following ways, and that we open a discussion about requiring 
> quotation marks for all non-boolean attributes as a conformance 
> criterion.
> 
> Suggested spec edits (some written in a diff-ish way, not all a true 
> diff, though):
> 
> Section 1.9
> 
> Keep:
> Attributes are placed inside the start tag, and consist of a name and a 
> value, separated by an "=" character. The attribute value can be left 
> unquoted if it doesn't contain any special characters. Otherwise, it has 
> to be quoted using either single or double quotes. The value, along with 
> the "=" character, can be omitted altogether if the value is the empty 
> string.
> 
> Add:
> In order to avoid errors and increase readability, using quotes is 
> highly recommended for all non-omitted attribute values.
> 
> [edit a lot of examples to include quotes]
> 
> 9.1.2.3
> 
> No suggested text, but a rewrite will be necessary if quotation marks becomes
> a conformance criterion.

Instead of preventing anyone from not using quote marks, I would instead 
recommend asking your validator vendor to offer you an option to require 
quote marks and warn you when you have forgotten them.

This would address your use case, as far as I can tell, without preventing 
anyone who _likes_ omitting quote marks from doing so.

In practice, parsing omitted quote marks is pretty reliably implemented, 
and it's been valid before and _widely_ used, so it's not an area we can 
really use to extend the language. Therefore the usual reasons we have to 
ban things don't really apply here, and I'd rather continue to allow 
quotes to be omitted.

Omitting quotes would also make a large number of pages invalid for more 
or less stylistic reasons, which would make it harder for people to 
transition to HTML5, and may annoy them ("Why do I have to add these 
quotes, they don't really add anything -- bah! I hate html5").


On Thu, 23 Jul 2009, Kornel wrote:
> 
> I wouldn't mind much if specification used more quotes in examples, 
> however I'm afraid that taking this to the extreme could give false 
> impression that unquoted attributes are an error, and spec would fail to 
> illustrate when quotes are necessary and when they're perfectly safe to 
> omit.

The spec intentionally uses a variety of markup styles (including many I 
find quote ugly) in order to show that they are all valid, and to not 
mislead the reader into thinking there are unstated rules.


On Thu, 23 Jul 2009, Keryx Web wrote:
> 
> As for conformance criteria only being about unambiguous parsing: If 
> that is the case we do not need them at all any more, since HTML 5 
> defines how to handle badly written markup.

We also want to make things non-conforming if their parsing behaviour is 
highly non-intuitive, or if it might be a future extension point, or if 
there is some harm that might come from using the feature, etc.


> And speaking directly to Ian H, a few years ago you said on this list 
> that you'd love for the spec to help teachers as much as possible 
> (within the limits of being a spec). My suggested example markup changes 
> is definitely such a help.

Unfortunately such benefits have to be balanced against the costs on more 
experience authors, many of whom like the flexibility of omitting quotes. 
However, I do think that a conformance checker could support a "stricter" 
or "teaching" mode in which it requires that attributes be quoted and 
optional tags not be omitted.


On Thu, 23 Jul 2009, Eduard Pascual wrote:
> [<p class=foo bar>]
> Furthermore, with the previous example, what'd happen if HTML6 defines a 
> new empty "bar" attribute that alters the rendering and/or semantics of 
> elements?

If HTML6 is written like HTML5, then we wouldn't use "bar" if it was 
commonly subject to this mistake.


> The part on readability is indeed a matter of style; but the part of 
> avoiding errors is quite valid. Maybe a more to-the-point wording would 
> work better; how about something like this?:
>
> "Quoting attribute values is always allowed, but only sometimes 
> required. In case of doubt, the safest choice is to quote the value."

In the introduction section, it now says:

# The attribute value can be left unquoted if it doesn't contain any 
# special characters. Otherwise, it has to be quoted using either single 
# or double quotes. 
 -- http://www.whatwg.org/specs/web-apps/current-work/#a-quick-introduction-to-html


On Thu, 23 Jul 2009, Eduard Pascual wrote:
> 
> While I don't consider a hard requirement would be appropriate, there is 
> an audience sector this discussion seems to be ignoring: Authoring 
> Tools' developers. IMO, it would be highly desirable to have some 
> guidelines for these tools to determine when they *should* quote 
> attribute values.

Is this not clear enough?

   http://www.whatwg.org/specs/web-apps/current-work/#attributes

It says, under "unquoted attribute value syntax", that the value "must not 
contain any literal space characters, any U+0022 QUOTATION MARK (") 
characters, U+0027 APOSTROPHE (') characters, U+003D EQUALS SIGN (=) 
characters, U+003C LESS-THAN SIGN (<) characters, or U+003E GREATER-THAN 
SIGN (>) characters, and must not be the empty string".

(Tools, of course, can just quote everything. There's no reason other than 
user preference for the authoring tool to not quote values, as far as I 
can tell.)


On Sat, 25 Jul 2009, Keryx Web wrote:
> 
> Consider this PHP template:
> 
> <input type=text value=$login name=login>
> 
> Value is the suggested text, if no user data is available it says "login".
> Otherwise its the users login name (no spaces allowed). All is well.
> 
> One day a developer decides that "login name" is a better value, and hard
> codes it into the PHP business logic, producing this HTML:
> 
> <input type=text value=login name name=login>
> 
> All of a sudden you *effectively* have produced this:
> 
> <input type=text value=login name="">
> 
> And it stops working.

I agree that this is an issue, and I would strongly recommend that people 
who write templates not make assumptions about the values they are 
inserting. (What if the login name contains a ">" character? Or an 
ampersand? Even without the change that introduces spaces, there are 
already bugs here.)


> Now, what would have been easier to avoid this? Url-encoding hard coded 
> variable data, or adding two quotation marks to the template?

Adding two quotation marks and escaping the contents of the value would be 
wise in general, in this case. However, that doesn't mean that the "name" 
attribute in your example should also have quotes.


> Bottom line:
> 
> I think my suggestion is totally analogous to e.g. semi-colon insertion 
> in ECMAScript. JSLint demands that those should be present, and I've yet 
> to hear anyone say "it's a matter of style". Omitting semi-colons is a 
> known cause of trouble in ECMAScript. Omitting quotation marks is a 
> known cause of trouble in HTML.
> 
> Choosing between robustness and saving a few bytes, one should always 
> opt for the former.

I think this overstates the trouble that lack of quotation marks causes.


On Sun, 26 Jul 2009, Mike Shaver wrote:
> 
> And yet, tons of inline event handler attribute values on the web omit 
> their trailing semicolons...as a matter of style.

Indeed.


On Sun, 26 Jul 2009, Michael Kozakewich wrote:
>
> The root of the problem is this: Requiring quotes, especially after all 
> these people have learned about HTML and have learned to code without 
> quotes, isn't backwards-compatible. Browsers already use their resources 
> to parse bad code, and so it's also too late to try forcing 
> well-formedness on those. At the same time, quotes -- if the writers 
> learn to always quote without thought -- decrease errors and also 
> normalize the language. The only answer, then, is to deprecate 
> not-quoting: Add quotes to the spec examples, state that quotes aren't 
> needed but are best-practice, add 'unquoted' warnings to the validator, 
> and teach new web developers to always quote attributes.

People can already always use quotes, so nothing is preventing that 
behaviour. Requiring quotes only makes a difference if people use 
validators. But if people use validators, then omitting the quotes is no 
problem either, as far as I can tell, since those errors will be caught 
too. So I'm not really convinced it's as clear cut as you describe.


On Sun, 26 Jul 2009, Keryx Web wrote:
> 
> Three kinds of attribute values have been identified:
> - Those that can have multiple words, e.g. class, alt, title, value...
> - Those that can have just one word or an integer, e.g. width, length...
> - Boolean attributes, that can be shortened in HTML.
> 
> Today teachers like me use (false) XHTML to enforce quotation marks for 
> all three cases, because we've seen the pedagogic benefit (and frankly 
> grown tired of looking over the shoulders of our students and say for 
> the millionth time "you've forgotten to quote that alt attribute 
> value").
> 
> I actually thought that having a tool that could enforce XHTML-ish rules 
> for the first (and perhaps second) category above, while still leaving 
> boolean attributes alone, would be seen as a benefit, not as a burden.

I agree that such a tool would be useful. I don't think we need to force 
its existence using the spec, though.


On Sat, 25 Jul 2009, Eduard Pascual wrote:
>
> I can't speak for others, but I did read your post. And still I am 
> convinced that a hard requirement to quote all values is not the best 
> solution. There are some values that MUST be quoted, some that SHOULD be 
> quoted, and even some that SHOULD NOT be quoted. Those that must be 
> quoted are already covered by the spec, and validators will yield the 
> relevant error message when encountering such values unquoted. For those 
> values that *should* be quoted (those that improve in readability when 
> quoted, or those that could lead to errors when they are later changed 
> if unquoted), a warning from the validator should be enough. Finally, 
> there are some values that are better unquoted, such as those attributes 
> that can only take a number (there is no risk of errors, and the quotes 
> would normally hurt readability more than they help it). Even in the 
> case of @type for <input>, quotes seem quite an overkill: AFAIK, there 
> is no valid value for this attribute that will make them strictly 
> needed; so there is no risk of the author changing the value into 
> something that requires quotes and forget to add them (unless, of 
> course, s/he changes it to something invalid, which will already bring 
> problems of its own). Since <input> elements tend to be relatively 
> short, and often given in a single line of source, adding boilerplate to 
> them for no purpose doesn't seem to be a good idea.

I agree.


On Sat, 25 Jul 2009, Michael Kozakewich wrote:
> 
> Values better unquoted are those that would make problems if they were quoted.
> I've never run into any value that couldn't be quoted; effectively, the cases
> you mention fall under "don't need to be quoted."
> Your argument lies on the fact that quotes make things less readable. Any
> change from what you're used to will appear less readable, but I'm not
> convinced that quotes inherently make code less readable. (Though I'll grant
> that Boolean attributes don't need quotes, unless they're XHTML.)

Personally I find that my programming background has led me to dislike 
quotes around numeric literals and keywords, because I consider quotes to 
be indicative of strings, and so quoting numbers and keywords makes me 
feel like I'm giving the wrong data type. It's irrational, of course.


> I see a disconnect here between the validator and the spec. The 
> validator would base everything on the spec, and so the spec itself 
> should recommend quotes for "potentially unsafe attributes" at the very 
> least, and back up that view in all the code examples.

Validators are allowed to warn for things that the spec doesn't mention, 
so long as they don't say that those are conformance errors.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the whatwg mailing list