[whatwg] Allow trailing slash in always-empty HTML5 elements?
hsivonen at iki.fi
Wed Nov 29 08:05:04 PST 2006
On Nov 28, 2006, at 23:20, Sam Ruby wrote:
> In HTML5, there are a number of elements with a content model of
> empty: area, base, br, col, command, embed, hr, img, link, meta,
> and param.
> If HTML5 were changed so that these elements -- and these elements
> alone -- permitted an optional trailing slash character, what
> percentage of the web would be parsed differently?
Obviously, 0% with parsers that opt to implement the HTML5 parsing
algorithm with error recovery as opposed to Draconian error handling--
except for the detail whether error-reporting parsers report an error
or not. (In theory, this is an issue for non-browser UAs that opt to
implement Draconian error handling. In practice, even my mostly
Draconian parser treats this particular error as non-fatal, because
it is so common and so easily recoverable.)
> The basis for my question is the observation that the web browsers
> that I am familiar with apparently already operate in this fashion,
> this usage seems to have crept into quite a number of diverse
> places, and all this is coupled with Lachlan's observations on
> what it would take to change the popular WordPress application to
> produce HTML5 compliant output.
WordPress is a soup-in-soup-out system that shouldn't be trying to
produce the XML syntax in the first place. But now that WP is using
it, the question becomes: which is more costly: asking the WP
developers to change their system or to adjust the definition of
conformance so that WP looks conforming more easily.
Anyway, as Lachlan already pointed out, whether or not the useless
slash should be allowed on elements whose content model is empty is
not an issue of technical damage to parsing interoperability but
about damage to the mental model of confused authors. So the cost to
consider is the cost of the confusion.
> As a side benefit of this change, I believe that I could modify my
> weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo
> the embedded SVG content, something that would needs to be
> discussed separately.
I am against blurring the distinction between the XML serialization
and the HTML serialization. The infamous Appendix C didn't bring
about good things.
Having a text/html serialization that is also parseable as XML
doesn't work from the UA point of view, because reality requires UAs
to parse text/html using an HTML parser. Now, since UAs can't use an
XML parser for parsing text/html anyway, it becomes useless for
content providers to ensure that their text/html content is XML-
Restricting the XML syntactic sugar, such as the use of CDATA
sections or <foo/> vs. <foo></foo> on the application/xhtml+xml side
would be wrong in principle, because it is wrong for a higher-layer
spec to micromanage lower-layer syntactic sugar or, worse, give
differences in syntactic sugar a difference in meaning. In practice,
limiting XML details of the application/xhtml+xml serialization would
be useless, because it is processed using XML processors which are
required to support full syntactic sugar anyway.
I think that your blog system is a special case. Considering that I
have seen the Yellow Screen of Death on your blog, it appears that
you aren't using an isolated serializer that could be swapped.
However, the reason why your site works is that it is built vastly
more competently than other systems that don't use an isolated
serializer *and* because you are both the developer and the deployer
and you care about these issues, you can and do fix bugs quickly.
That just doesn't work with systems that aren't constantly managed by
So no offense intended, but I think that what would work for you (or
Jacques Distler) isn't generalizable. Rather, a warning to the effect
of "professional driver on closed road" would be appropriate. :-)
hsivonen at iki.fi
More information about the whatwg