[whatwg] several messages about HTML5

Tue Feb 20 14:37:04 PST 2007

Vlad Alexander (xhtml.com) wrote:

(NB I'm just another correspondent, not an official WHATWG voice or
anything.)

> Why not put an end to "tag soup" by requiring user-agents to only
> accept markup written to specification?

Problem 1: Even if HTML5 were /not/ intended to be backwards compatible,
would browser developers implement such a specification, even were the
specification precise enough that algorithms could distinguish "markup
written to specification" from (say) valid, well-formed gibberish?

Problem 2: HTML5 is intended to define a parsing model that can cope
with the majority of existing text/html content. Very little of that
conforms to any specification; a great deal of it is "tag soup".

> 5. X/HTML 5 has a construct for adding additional semantics to
> existing elements using predefined class names. Predefined class names
> could be the most controversial part of X/HTML 5, because the
> implementation overloads the class attribute. XHTML 2 provides similar
> functionality using the role attribute. Which approach is better and
> why?

I keep trying to wrestle with this question. Neither is great. Both
roles and microformats promise to be extension mechanisms, but neither
(AFAICT) includes a reliable mechanism for defining default behaviour or
style. When you mark up a heading with <hX> you can reliably expect user
agents to expose that information ("this is a heading") to end users. No
such expectation is possible with microformats or roles. They are thus
architecturally best suited for machine processing and as style hooks,
not to be considered as genuine additions to the standard set of
elements and attributes (compare the HTML 4.01 spec's discussion of the
class attribute).

A major exception to this rule will be any microformats or roles
referenced from the Web Applications 1.0 or XHTML2 specifications
themselves, because there we should be able to count on user agent
support. In the case of XHTML2, this includes not only the rather vague
roles defined in the role module but more importantly the WAI roles that
are actually being implemented by Mozilla (and related AT such as
Window-Eyes, JAWS, and NVDA).

I suppose whether you think microformats are better than roles depends
on (among other factors):

1) Whether you think a central repository of human-readable definitions
(currently the WHATWG wiki) or human-readable definitions supplemented
by RDF available from decentralized repositories that are not
necessarily referenced from documents are the best ways of defining the
new semantics.

2) Whether you think being able to use the semantics in text/html is
important. (You can express some roles in text/html but not combined in
certain ways.)

3. Whether you think any putative benefits of roles as an extension
mechanism (rather than an accessibility enhancement) justify replacing
microformats, an existing system that (kind of) works.

> 6. The font element is a terrible construct, primarily because content
> creators using authoring tools use the font element instead of
> semantic markup. The X/HTML 5 spec supports the font element when
> content is authored using WYSIWYG editors. What is the rationale for
> this?

I think <font> should be jettisoned myself, but it does help prevent
misuse of semantic elements for presentational purposes.

>  Why would WYSIWYG editors get an exemption? 

Because anybody generating HTML /should/ know better.

> And is this exemption going to make the Web less accessible?

Mixing in presentational information with semantic information /tends/
to decrease accessibility, but not as much as semantic misinformation.
So it's hard to say. At least with <font> it's still text. What we need
is new authoring tools that dump the broken WYSIWYG model, more than new
specifications.

> 7. The XHTML 5 spec says that "generally speaking, authors are
> discouraged from trying to use XML on the Web". Why write an XML spec
> like XHTML 5 and then discourage authors from using it? 

Well, one reason would be that popular and old browsers, and hence AT,
don't support XHTML, and newer/better browsers don't support it as well
as they do text/html. Of course, this reason will be half neutralized if
the backwards compatibility of XHTML5 turns out to be mostly phantom.
I've always found the idea (occasionally floated on this list) of
defining a reliable subset of HTML 4.01 with a better defined parsing
model rather attractive for this reason. It could then be used as a
target for transformations.

> Why not just drop support for XML (XHTML 5)?

Well a key motivation is that some authors prefer using XHTML.

--
Benjamin Hawkes-Lewis