[whatwg] HTML 5 vs. XHTML 2.0

Sun Nov 14 01:13:46 PST 2004

On 13 Nov, 2004, at 12:52 PM, Laurens Holst wrote:
> ...
> Be that as it may be, you're forgetting something - HTML is not just  
> for the web, it is also a document markup language for many (which can  
> and is of course often used and actually specifically aimed at the  
> web). At my job we currently create the documentation files for our  
> product with a transformation of our documents, which use a 'custom'
> ...

What Henri said. If long-term fidelity is important, HTML should be  
something you convert to, not your native format.  
<http://diveintomark.org/archives/2003/01/13/semantic_obsolescence>

> ...
> I don't think this is a spec just for 'the ignorant mass'. A spec  
> aimed at them can hardly be taken seriously, because it will take a  
> lot to make them learn.

There are more of them than there are of you and me, and we benefit  
from their documents on the Web. (You might reminisce about the days  
before GeoCities and Xanga and Time Cube, but those were also the days  
before Google and eBay and Wikipedia.)

> ...
> Anyway, let me stress again that for 'HTML 5' I am highly in favour of  
> adopting XHTML 2.0 with the unused HTML 4.01 tags marked 'deprecated'  
> (this is an important difference from XHTML 2.0 which removes them  
> altogether), and perhaps some additions. Because XHTML 2.0 is  
> definitely a more serious markup language.

The Web is not, and since about 1995 has not been, a serious medium. It  
is much more often used for selling books than for publishing them, for  
simulating sex than for discussing it, and for posting opinions than  
for posting facts. For the Web's pockets of seriousness you might use  
XHTML 2.0, but XHTML 2.0 is rather primitive; why not use TEI P4  
instead?

> And I'd say HTML 5 being compatible with XHTML 2.0 is a great merit  
> for both.

Maybe, but backward compatibility is expressly a design goal of HTML 5  
<http://www.whatwg.org/charter#back-compat>, while it is expressly not  
a design goal of XHTML 2.0  
<http://www.w3.org/TR/2004/WD-xhtml2-20040722/ 
introduction.html#backCompat>. Such divergent processes are unlikely to  
produce the same result.

> I'm getting the impression that we are here discussing much that has  
> already been through thoroughly on the XHTML 2.0 working group.

Probably, though for the compatibility reason given above, our  
conclusions may often be different.

> For example the quotes thing - in XHTML there's no <q> anymore but  
> there's <quote>, a choice very likely made because of the exact same  
> concerns raised overhere (being inconsistency between <q>  
> functionality, which a new tag would solve). Or removing <acronym>,  
> <big> and <small>. The accesskey functional choice they made sounds  
> pretty decent, from what I hear here. <var> is used (just maybe not by  
> you, but I have several times),

I use <var> whenever appropriate, which is about once a year, but I  
recognize that it is unlikely ever to have any semantic usefulness  
(because variable names aren't unique enough). I use <q> much more  
often, and I will weep hot tears if/when it is abolished, but I  
recognize it is a poorly-supported, backward-incompatibly-confusing  
element, with hardly any semantic usefulness, and an uneasy  
relationship with English punctuation (except in en-GB-hixie and  
similar dialects).

> and the functionality of <cite> is greatly enhanced, making it a much  
> more useful tool,

My list of deprecable items included cite=, not <cite>. cite= is mostly  
useless for three reasons. First, it's invisible, so authors don't use  
it, so it can't be relied on or aggregated usefully. Second, it expects  
a URI, but the cited material isn't necessarily represented online  
<http://lists.w3.org/Archives/Public/www-html/2003May/0214.html>.  
Third, it doesn't relieve authors from having to provide text citations  
before/after the quote as well (if they didn't, the text would be  
nonsensical in hypertext-less media such as printouts or telephone  
conversations).

> which can also be employed for data mining (I've seen a similar thing  
> on dive into mark's blog once iirc).
> ...

You mean posts by citation  
<http://diveintomark.org/archives/2002/12/27/pushing_the_envelope>. I  
hope "Hixie said I was using [<cite>] correctly"  
<http://diveintomark.org/archives/2003/01/19/influences> was an  
over-broad interpretation of Ian's words, because (a) Ian has mentioned  
"'clarifying' the definition of <cite>"  
<http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2004-November/ 
002329.html>, and (b) while Mark's uses of <cite> matched the example  
given in the HTML 4.01 spec  
<http://www.w3.org/TR/REC-html40/struct/text.html#edef-CITE>, they did  
not match the default presentation in all visual UAs, nor the resultant  
use by most Web authors.

(Specifically, I think the most coherent and backward-compatible  
"clarification" would be to restrict <cite> to titles of works, because  
inviting authors to use it for names of people as suggested in the HTML  
4.01 example would require authors to override <cite>'s italic-ness  
frequently, making them more likely to abandon the element completely.)

-- 
Matthew Thomas
http://mpt.net.nz/