[whatwg] Joe Clark's Criticisms of the WHATWG and HTML 5
hsivonen at iki.fi
Sun Oct 29 07:33:07 PST 2006
(Sent both to the WHAT WG list and to Joe Clark himself, because I
assume he doesn't subscribe to the list.)
On Oct 29, 2006, at 06:33, Lachlan Hunt wrote:
> I thought Joe Clark's opinions and criticisms of the WHATWG and
> HTML5 might be of interest to people here.
Quotes hereafter from the blog post.
> Tim Berners-Lee decides to actually do something for a change.
> Unfortunately, it’s the wrong thing.
I am very interested in what the W3C is really up to. I am unable to
figure out if they are doing the right thing (endorsing the WHAT WG
work) or the wrong thing (doing their own competing and conflicting
> Due in no small part to WHAT WG’s leadership by a strict standardista
Well, the leadership applies different kind of strictness to the
tokenizer/DOM level and to semantics. Personally, I'd like the
tokenizer/DOM part to be a tad stricter and the semantics part to be
> * HTML has samp, var, and kbd. I use all of them and I am
> pretty much the only one who does.
FWIW, I think <samp> and <kbd> don't deserve to be in HTML and I am
not convinced that the use cases for <var> could not be satisfied by
> * “HTML5” has meter (for measurements) and t for time notation.
The meter element is not for marking up measurements like "1.6 km".
It is for displaying a gauge widget that visualizes a semi-static
fraction (for example usage of Gmail storage quota) as opposed to a
progress bar. It is for applications--not prose documents.
I can't remember seeing any use case-based rationale for the <t>
element. I'm not convinced that having it is a good idea.
> But, true to member biases, “HTML5” bans the use of dl–dt/dd for
> dialogue, a usage permitted by the HTML spec and in wide use by
> intelligent developers like me who have to mark up documents
> unrelated to computer science.
This issue came up on the list recently.
> (They’d prefer you use a thicket of blockquotes and cites. And,
> presumably, nullify all the indention and italicization that
> browsers will do by default.)
I'm inclined to think that the <cite> element is useless. <i> could
be used for marking up titles of works and <b> could be used for
magazine and newspaper-style marking up of first instance of personal
names. I have yet to see a markup consumption use case that would
work on the public Web and would use <cite>.
Also, I was unable to explain to my mother why she should use <cite>
instead of whatever command-i does in Dreamweaver. (Apparently,
command-i applied <i> Dreamweaver 4 but applies <em> in Dreamweaver
MX, which should indicate to semanticist that <em> and <strong> are a
lost cause and really are only aliases for <i> and <b>.)
> Tantek told me (E-mail, 2006.05.16):
> [T]he “print publications” crowd… cares much more about pixel
> precision, etc. They don’t (typically) bother to even mark up their
> headings using h1…h6. Now, if there were a bunch of Web pages today
> which were adaptations of print publications where they tried to
> use semantic markup as much as they could and then started using
> <div class=""> for semantics that HTML didn’t capture, then that
> would be evidence.
> Of course there weren’t “a bunch of Web pages today” that did so;
> the elements didn’t exist. And that proved the elements didn’t need
> to be invented.
> This attitude – still present in WHAT WG, though it is separate and
> was formed later – can be summed up as “Until we decide you are
> using our computer-science tags adequately, we won’t even consider
> the semantic needs of your documents.”
Well that's not the whole story in the Tantek quote above. The same
<div class='…'> appearing over and over on different sites does
indicate demand for an element.
> For “HTML5” and the new HTML variants, why can’t we just adopt
> what’s already been done in other namespaces, like the Text
> Encoding Initiative and tagged PDF?
I think learning from tagged PDF is a good idea.
However, the last time I properly read the PDF spec was in 2002 and I
haven't done much testing. (I try to stay away from Adobe Reader. I
use Preview.app and PDF readers derived from xpdf. In the environment
where I'd benefit from tagged PDF, the reader doesn't support it.) I
understand how Adobe Reader uses paragraph, heading and table markup,
but I don't know about the rest. How are the other tags actually put
> * annotation
How would that work?
> * note and reference for footnotes, endnotes, and sidenotes
> (not aside in “HTML5”)
Yes, this is an area where document and converter authors currently
need to come up with their own class-based hacks. Ideally a
continuous media user agent could show footnotes in context so that
they don't become de facto endnotes.
> * caption generically applicable to tables and figures
There have been suggestions about image captions on the mailing list.
> * bibliographies, tables of contents, and indices (some in
One of the issues here is the tension of HTML as an authoring format
and HTML as a delivery format. That is, do we really want the browser
to do the stuff BibTeX does? OTOH, if the browser just displays
output from a bibliography generator, what level of semantic encoding
is actually useful for the consumers of the document? PDF doesn't
attempt to go further than identifying what blocks are bibliography
entries. Is that useful enough to bother? If the markup is very
detailed so that Google Scholar (or whatever) could analyze cross-
references in scientific papers, wouldn't that veer back into
focusing on computer science papers?
I, for one, am writing about HTML5 in LaTeX. One of the reasons was
BibTeX even though I have to hack a .bst of my own.
> * nonstruct for generic groupings
How is different from <div>?
> * formula
How does that work? Is the rendered rectangle of the formula
preserved when reflowing the visual presentation? Surely aural
rendering is not possible?
> I would even be fine signing on to “HTML5” if WHAT WG made some
I realize that people have limits on how many mailing lists they can
usefully participate in, but it would be really useful if you could
raise any remaining issues you have on the WHAT WG list so that
they'd get registered in the issue tracker (i.e. Hixie's mailbox :-).
> * Ban tables for layout.
As long as graphic designers want to use grid-based layouts, telling
them to fake them with floats or, worse, positioning is jumping from
the frying pan into the fire. (And telling them to use display:
table; doesn't work if IE doesn't support it.)
A personal real-world case:
I use the Presto-based browser that ships with Nokia 770. Due to the
physical characteristics of the screen of the 770, I cannot
*comfortably* read text at pixel sizes that are typical for desktop
usage scenarios. The pixels on the 770 screen are more densely packed
than on a typical desktop screen. To read book-length prose on public
transport, I need to have the font set to 26 px sans-serif (16 px
serif by default). Yes, 26 px--and I have set the minimum font size
to 18 px (7 px by default).
I need to make these part of the 100% zoom level settings instead of
zooming when I read long passages, because Presto's zoom is useless
for my needs on the device, because it zooms the view port width
given as input to the CSS layout algorithm (and the "Optimized View"
has a fatal bug and is too hard to activate). (Sorry, Opera guys. I
really could use a Gecko/Tasman/WebKit-style font-only zoom, though.)
This arrangement actually works pretty well with table based layouts.
I can, for example, read news from the BBC, CNN or Helsingin Sanomat.
However, the situation is not so fortunate with float or positioning-
based sites. I can't read Slashdot or Sam Ruby's blog
(intertwingly.net) *at all* due to the way they use floats and make
assumptions about the proportion of em to the width of the view port.
The sidebar gets so wide that the main content get zero width. Some
supposedly standardista XHTML-1.1-as-text/html-plus-positioning
layouts break spectacularly, too. (And due to the way the Presto zoom
works, zooming out is completely useless.)
In practice, as far as mobile accessibility for me is concerned with
today's shipping software, traditional single-column HTML works
great, table layouts are a bit annoying but mostly harmless and the
supposedly accessible cool CSS stuff can make content unreadable.
> * Allow fragment identifiers to start with any ASCII character,
> not just a letter. Suddenly hundreds of millions of Blogger comment
> URLs become valid.
Yeah, SGML and XML Name tokens are pretty arbitrary. I'd be happy
with IDs consisting of one or more XML characters except whitespace
as defined in XML 1.0.
> * Allow multiple uses of the same id/label in a form and
> suddenly it becomes possible to mark up multiple-choice
> questionnaires accessibly.
I am not sure I understand what kind of markup you are advocating here.
> * Give us actual rowgroups (not just tbody) along with
> colgroups in tables and maybe browsers will begin to support both
> of them. (Table headers also badly need fixing.)
What's wrong with using tbody for grouping rows? At least it is
> * Let us nest certain block-level elements in certain other
> ones right away, à la XHTML 2. A p really should be able to contain
> an ol.
Fixed in XHTML5.
> (And a dt really should be able to contain a p.)
> * Make embed legal. Give it up, people: object doesn’t work and
> never will.
Indications are that it will be legal in HTML5.
> Why are measurements or other new things more important than, say,
> Web accessibility?
Are improving HTML for application and improving accessibility
mutually exclusive? Do you mean HTML should be frozen and
accessibility work should focus on UAs and authoring practices using
the frozen HTML?
hsivonen at iki.fi
More information about the whatwg