[html5] 4.01 vs XHTML

Jukka K. Korpela jukka.k.korpela at kolumbus.fi
Mon Oct 15 10:18:30 PDT 2012

2012-10-15 15:20, Prof. T.D. Wilson wrote:
> I'm not at all clear why you would choose to use html 4.01 instead of 
> html5?

My notes about using HTML 4.01 doctype declaration were related to 
situations where your markup is in fact closer to HTML 4.01 Transitional 
than HTML5. To deal with validator error messages, we should minimize 
irrelevant messages, like messages about elements and attributes that we 
use on purpose, even if they don't meet someone's ideas of what HTML 
should be. Unfortunately, setting up an HTML5 DTD that one could tweak 
to allow what you use intentionally does not seem to be easy at all, as 
my attempt at HTML5 DTD (http://www.cs.tut.fi/~jkorpela/html5-dtd.html) 
suggests. If I were about 40 years younger, I would probably want to 
write a real HTML linter that is meant to help authors, not suffocate them.

>  True, the latter is not yet fully confirmed as a standard, 

It is nowhere near becoming a standard. But it in on the W3X equivalent 
to "standards track", i.e. it is aimed at eventually becoming a W3C 
Recommendation. The WHATWG approach is different, though at the 
practical level, the differences aren't as big as one might expect.

> but most new versions of browsers appear to be able to cope with it, 
> so why stick with 4.01?

In fact, IE 9 is rather far from supporting HTML5 in any real sense, and 
any useful HTML5 novelties have rather limited support in browsers.

> The main virtue of xhtml to my mind is the discipline it creates in 
> the use of tags - requiring end tags, for example, and in the formal 
> nesting of tags - if your site or page validates as xhtml then you can 
> be pretty sure it is going to be readable by anything - and of course, 
> you can retain that discipline in switching to html5, although my 
> understanding is that you do not need to do so.

Requiring end tags is easy: you just need to specify a DTD that requires 
it. Whether it is progress is debatable. Omission of redundant tags 
is/was one of the strengths of SGML (as well as classic HTML). XML (and 
hence XHTML) requires them to make parsing essentially simpler: XML can 
be parsed without the faintest clue of the intended structure (a DTD), 
not to mention semantics. This technical aspect can be turned to a 
virtue, but it would be a mistake that it affects cross-browser 
functionality. And validating as XHTML, whatever version, is a purely 
formal thing and does not imply readability, functionality, or browser 
independence (contrary to validation advocacy promulgated by the W3C 

> Some of html5 is clumsy for some purposes - but I guess that is always 
> going to be the case. For example, the syntax of <article> and 
> <section> will vary depending upon the nature of the page.

The real issue here is that such elements have been defined so vaguely 
that authors who try to use HTML5 will endlessly analyze and discuss 
their page structure, to no avail. It is comparable to scholastic work, 
though this might be somewhat unfair (after all, scholasticism has made 
real achievements). It would be much better if the implied ideas about 
search engines and content harvesters were made explicit. They could 
then be compared with the actual behavior of such software.

> So the lack of formal syntax here is actually an advantage, given the 
> wide range of uses to which html is going to be put.

I don't think there is any lack of formal syntax here. There is lack of 
rigorous semantic definitions.

> Another point to be wary about in html5 is that it is touted as having 
> 'semantic' tags - it doesn't.  'Semantic' has to do with representing 
> meaning (i.e., what the enclosed text etc. is /about/) and the 
> so-called semantic tags say nothing about the meaning of what they 
> enclose but only about the kind of information one might find there. 
> So, 'footer' is not semantic - it only tells you the location of the 
> information, but not what kind of information is there, because you 
> can place in the footer whatever you wish; similarly, the 'header' and 
> 'nav' tags are only location indicators - you could use the 'nav' tag, 
> for example, to put any kind of information in, rather than navigation 
> pointers .
Well, perhaps not just location. But it's about structure, not meaning - 
people have just lost the idea of meaning, when they call things 
"semantic" when they are structural (usually with strong presentational 

Yucca, http://www.cs.tut.fi/~jkorpela/

More information about the Help mailing list