[whatwg] Make quoted attributes a conformance criteria
Henri Sivonen
hsivonen at iki.fi
Tue Aug 4 04:59:00 PDT 2009
On Aug 3, 2009, at 05:45, Ian Hickson wrote:
> On Thu, 23 Jul 2009, Keryx Web wrote:
>> No suggested text, but a rewrite will be necessary if quotation
>> marks becomes
>> a conformance criterion.
>
> Instead of preventing anyone from not using quote marks, I would
> instead
> recommend asking your validator vendor to offer you an option to
> require
> quote marks and warn you when you have forgotten them.
There's a usability cost and a QA cost to adding optional features to
a validator, which is why I try to resist requests to add more
configuration and optional features to Validator.nu.
I've gotten requests to add checks inspired by XHTML. These requests
generally aren't about true polyglot checking (since few people know
how long the actual polyglot checking corner case list is). Also,
these requests aren't about code style in general. (Well, actually,
Anne asked for indent style checking on Sam's blog comments and
another commenter thought Anne was making fun of the quote issue...)
Since the requests happen to be about the most prominent syntactic
features of XML as opposed to being across the board about code style,
I suspect part of the requests is about unease about letting go of
extra requirements taught as part of XHTML-as-text/html evangelism.
When adding optional warnings to Validator.nu, I'd like to tell apart
actual problems from unease of letting go of XHTML-as-text/html before
proceeding. (I expect "actual problems" to be with us always, but I
expect the unease to pass with a little time.)
The top 4 requests are:
* Flagging unquoted attributes.
* Flagging implied tags.
* Flagging non-lowercase element and attribute names.
* Flagging inconsistent use of /> on void elements.
I think the implied end tags are different from the rest, and I think
an option to flag implied tags would be a useful feature to have. I
want to implement it, but I have some higher-priority Gecko items on
my plate first.
Implied tags is different from the rest, because tag inference doesn't
necessarily work like authors expect, so automatically generating the
tags might not do the right thing.
OTOH, the other cases can be safely fixed automatically. (Except some
quoting issues; more on that later.) In fact, indent style is also
something that could be made consistent automatically by an HTML-aware
text editor. When issues are code style issues in nature and don't
need human intervention to change to a particular style, I think it's
more useful to have an editor that simply reformats code than to have
a validator that flags failure to comply to code style without
performing the reformatting. Consider Eclipse JDT: If you have bad
indents, you don't get warning or error markers in the margin.
Instead, you can ask Eclipse to reformat code according to a wide
variety of settings.
This creates a mild issue: If different people collaborate and have
different code formatter settings, having another person's editor
reformat code creates some source control issues. However, you don't
get error messages, so you are still avoiding the problem that I
raised earlier on public-html and the Maciej mentioned here about tool
interop (so that you can swap tools without getting a huge bunch of
errors).
Also note that we can't really eliminate this source control re-
indenting issue on the spec level. There's no way we could get
everyone to agree on One True HTML indent style. And as long as there
isn't One True indent style consistently applied everywhere, I think
it doesn't matter much if other syntax is used consistently.
Now, people are going to say that it's good to use /> consistently,
because it helps you see which elements are void elements. It doesn't
work that way, though. Because /> has no effect on HTML elements, you
still need to *know* which elements are void elements, and pretending
that /> means something poisons the mental model people have and is
actually bad for teaching. (People write <div class="foo"/> or <script
src="..."/> having heard that /> closes the element.) Unfortunately,
we need to keep /> conforming to make it easy to upgrade XHTML-as-text/
html-emitting systems to HTML5.
I think it would be rather arbitrary to add a feature for checking the
*consistent* use of <br> vs. <br/>. Why not <br/> vs. <br />?
foo='bar' vs. foo="bar"? Or indent style?
As for lower-case names, I don't think people *really* want lower-case
names. I think, as a matter of code style, they want *canonical-case*
names, which aren't all lower-case for SVG-in-text/html and MathML-in-
text/html (definitionURL). I think adding checking for this would have
disproportionate ill impact on the parser code base compared to
benefit. In a reasonable general-purpose HTML parser implementation,
the case information is lost before it is decided if a tag belongs to
an HTML, SVG or MathML element and maintaining a special-purpose
parser for validation wouldn't be good. On the benefit side, I don't
think accidentally holding down the shift key when typing a name is a
notable practical authoring problem. Due to this disparity in benefit
and code complexity badness, I'm not planning on implementing this
check.
Back to the unquoted attributes request. I think it's the hardest one
of the four to decide whether to implement or not. I think it is easy
to decide unquoted attributes shouldn't be errors, and it's easy to
decide that if the feature were available, it should be optional.
(Making it mandatory would annoy people updating existing sites using
quote omission and people who know just fine that they can omit quotes
on stuff like type=radio and don't want to type extra.)
There's one case that clearly needs an unconditional warning, though:
<foo bar=baz/> when the format of the value of bar doesn't exclude a
trailing slash. In this case, the /> feature interacts badly with the
quote omission feature.
I think a good way to proceed here is to write more complex code for
detecting <foo bar=baz/> first and seeing if it together with more
precise datatyping than in old DTD-based validators is enough to catch
actual problems without introducing more UI options.
If after that change users of Validator.nu still face uncaught
problems due to quote omission (e.g. class or alt eating up whatever
follows and somehow managing not to generate any subsequent error), I
think exploring a feature for optionally warning about unquoted
attributes would make sense.
> This would address your use case, as far as I can tell, without
> preventing
> anyone who _likes_ omitting quote marks from doing so.
[...]
> Omitting quotes would also make a large number of pages invalid for
> more
> or less stylistic reasons, which would make it harder for people to
> transition to HTML5, and may annoy them ("Why do I have to add these
> quotes, they don't really add anything -- bah! I hate html5").
I think that quote omission should stay conforming for these reasons.
> (Tools, of course, can just quote everything. There's no reason
> other than
> user preference for the authoring tool to not quote values, as far
> as I
> can tell.)
I encourage anyone who is writing an HTML serializer to use double
quotes for attributes unconditionally, unless there's a specific need
to optimize file size to the point of counting bytes. (Single quotes
are worse, because developers are tempted to escape ' as ' in
attribute value, but ' has compat issues with IE versions still
out there.)
> On Sat, 25 Jul 2009, Keryx Web wrote:
>>
>> Consider this PHP template:
>>
>> <input type=text value=$login name=login>
>>
>> Value is the suggested text, if no user data is available it says
>> "login".
>> Otherwise its the users login name (no spaces allowed). All is well.
>>
>> One day a developer decides that "login name" is a better value,
>> and hard
>> codes it into the PHP business logic, producing this HTML:
>>
>> <input type=text value=login name name=login>
>>
>> All of a sudden you *effectively* have produced this:
>>
>> <input type=text value=login name="">
>>
>> And it stops working.
>
> I agree that this is an issue, and I would strongly recommend that
> people
> who write templates not make assumptions about the values they are
> inserting.
If you aren't manually typing both the attribute name and the value in
a text editor, you should always use double quotes for generated
values to avoid trouble.
--
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
More information about the whatwg
mailing list