[whatwg] A call for tighter validation standards

Sun Oct 25 15:40:20 PDT 2009

On 10/23/09, Curtiss Grymala <curtiss at ten-321.com> wrote:
> My main concerns about the current version of the HTML5 specs were laid
> out pretty well by Henri in the mailing list message from August 2009 I
> linked above. However, I wanted to reiterate the ones that concern me
> and add my thoughts.
>
>      1. Unquoted attributes

>      2. Implied tags (such as leaving out a closing paragraph tag)

I understand your point, but compare
<title>[page title]</title>
<p>[page text]
<p>[more page text]
against
<html>
 <head>
  <title>[page title]</title>
 </head>
 <body>
  <p>[page text]</p>
  <p>[more page text]</p>
 </body>
</html>

The former example is very compact, and quite clear -- at least if you
know HTML (why otherwise would you be reading HTML sources, anyway?).
And even if you don't then the structure of the code can give you some
hints. Less typing means less to transfer and possibly more typing
left for content :) And you should note that I left all element I
could (keeping it valid HTML 4), you may leave some elements (such as
<body> and <head>) but keep others (such as <html> and <p>) if you
think that's more readable. The worst case is one-liners with no
structure what-so-ever, but they're unreadable anyway.

The second one is very clear, but I really don't care whether the
title element is in <head> or <body>. That's the UA's matter.

>      3. Inconsistent use of the closing slash on empty elements
That, of course is bad, wrong and based on a misunderstanding that
XHTML is simply supersede (update) of HTML with stricter syntax. It's
named _X_HTML because it's incompatible with HTML! Closing slashes on
empty elements should be invalid in all cases (in HTML).

> My concern with all three of these points is that they are relying on
> the browser to interpret the coder's intent when rendering the elements.
> The unquoted attributes and implied tags are subject to wide
> interpretation by the browsers. Regarding the unquoted attributes, I
> fear that new coders might not understand when and why attributes should
> be quoted and will spend a lot of time wondering why their pages are not
> rendering properly. For instance, imagine a new coder trying to declare
> inline style definitions without quoting the style attributes. How will
> a browser interpret something like:

><p style=width: 500px; height: 100px;>

This is simply a P element with the attributes style[="width:"],
500px;, height: and 100px;. No confusion for UAs. Humans might though
see it in a different way, so attribute contents SHOULD always be
quoted.

>It becomes even more dangerous in that height and width are actually attributes of most HTML
>elements, so the browser will have to do quite a bit of work to figure out what to do with these
>types of definitions.

Actually, I can't see why it would require more work to look up
attribute name that are similar to valid attribute names than to look
up completely nonsense attributes.

> I would like to reiterate that I am not asking the WHATWG to recommend
> browsers dropping support for any older HTML specs (in fact, I am very
> much in support of the browsers continuing to support all older HTML
> specs and, to the best of their ability, supporting HTML from before
> there were specs and recommendations).

Hear, hear! (hope that's conforming English, where is the English validator?)

> However, what I am asking is that the WHATWG consider writing the specs
> so that those older, less rigid styles of coding do not validate
> according to the standard. Coders will still be free to write the code
> with implied closing tags, unquoted attributes and inconsistent use of
> the closing slash, but I don't believe that type of code should
> validate, as it does not conform to an actual standard, rather it
> conforms to exceptions to standards.

If they're allowed to write the code, it should validate. All other
behaviors are inconsistent. A validator may raise a warning, but
that's another matter.

> In my blog post, I likened the looser standards of HTML5 to removing
> laws against driving while intoxicated. Sure, abolishing those laws
> would not force anyone to drive drunk, but without any legal
> ramifications for doing so, it would be much more prevalent. As with
> that example, if the standards for HTML are loosened as compared to
> XHTML 1 (served as text/html), coders will not feel the need to code
> neatly or consistently, and we will begin to slip back into the
> spaghetti code we experienced throughout the 1990s.

Actually, by now I agree that implied closing tags should not be used
much. I though think that implied closing tags should rather be
likened to acronyms than driving drunk, somehow. Acronyms are OK, but
shouldn't be used in formal contexts where machines are meant to mine
the text. But if the machines are required to understand implied
closing (and opening) tags anyway, I don't actually see much reason to
enforce everyone not to use them. A mere "SHOULD NOT" should be
enaugh[?].

> I appreciate you taking the time to read this message, and I hope I can
> have at least a little bit of an impact on the formation of these
> standards. Thank you.
Well, thank you too.