[whatwg] The real issue with HTML5's sectioning model (was: "Headings and sections, role of H2-H6" and "Should default styles for h1-h6 match the outlining algorithm?")

Fri Apr 30 11:57:42 PDT 2010

I think I already mentioned this before, but seeing how the issues are
surfacing again, maybe it's worth to revisit the real *roots* of the
problem.

Basically, most of the issues with headings boil down to a single
fact: the sectioning model is (probably needlessly) over-bloated. Some
people will hate me for what I'm going to do, but I have to: I'm going
to compare (again) HTML5 and XHTML2, but I'll even add HTML4/XHTML1 on
the mix:

XHTML2's approach was clean and simple: <section>, <h>, and @role do
everything. Period. Even too simple: @role was defined as a "common"
attribute (ie. available to any element), and its definition was, on
the best case, non-trivial (I vaguely remember reading somewhere that
it meddled with RDF, but I never really knew what that attribute was
exactly supposed to represent).

In legacy X/HTML (HTML up to 4 and XHTML up to 1.1), it was all about
<h1> through <h6>, with all the issues we already know.

And now, in HTML5, not only have <h1-6> been kept, but a plethora of
new elements: <section>, <nav>, <aside>, <article>, <hgroup>,
<header>, <footer>; and it even messes with <address>. The
justifications for <h1-6> (backwards compatibility, better transition,
etc) are quite sound; but the 7 new elements more than double the
mess. Actually, if we try to "implement" the outlining algorithm in
the form of selectors that match each level of headings we have:
On the case that the <h1>-only approach, selecting each level of
heading requires a list of something raised to the n-th power
selectors, where n is the heading level minus one. In other words: the
top level heading can be selected with "h1", but the next level would
require "section h1, nav h1, aside h1, article h1, ...", then for the
third level we go nuts (the example is limited to <section> and
<article> elements, including all of them would yield a list of too
many selectors): "section section h1, section article h1, article
section h1, article article h1". A four level would translate into 64
selectors or more (already quite insane to author), and if we ever
reach the fifth and further levels, we'll be dealing with hundreds or
thousands of selectors. If this isn't insane enough, keep in mind that
this is an over-simplification. Sure, there are combinations that will
never happen, but if we have to select also sub-headings inside a
<hgroup> things get pretty funny.
On the case of a mixed approach, it is *absolutely* impossible to get
the headings properly matched with current selector technology. Even
with jquery's :has() (many variants of which have been proposed
several times on the CSS discussion lists), things would be extremely
hard, if even possible at all.

So, that's enough of a problem statement (at least for now). My
suggestion is to clean things a bit: consolidate the sectioning model
into a single element+attribute pair, like this:
<section> stays as is.
<nav> becomes <section kind="nav">
<aside> becomes <section kind="aside">
<article> becomes <section kind="article">
<address> becomes <section kind="address"> (and the former is defined
in the compatibility section as equivalent to the later, because it is
the only element of the sectioning model that already exists in
previous versions of HTML).
I'm not sure about what should be done with <header>, <footer>, and
<hgroup>, but I hope this is a good place to discuss it ;-)

Any UA would have exactly the same amount of information within the
element, so the outlining algorithm could be perfectly implemented.

This yields several advantages:
1) The styling issue improves drastically: any pre-HTML5 will
understand this (IE would require a bit of javascript anyway) out of
the box:
h1 { styling for top-level }
section h1 { styling for second-level }
section section h1 { styling for third-level }
and so on, for as many levels as you need.

2) All of a sudden, something like <section kind="aside nav"><h1>See
also</h1> some indirectly related links here...</section> becomes
possible, plus easy to style, and works happily with the outlining
algorithm.

3) Future needs will become easier to solve on future versions of the
specification, and with significantly smaller costs: for example,
let's assume a new sectioning element such as <attachment> becomes a
widespread need (it would already make sense on sites like web-mail
services, discussion boards, bug-trackers, and some others...). So a
new crouton on the soup, which would be treated quite like a generic
<div> by pre-HTML6 (or 7, or whatever) browsers. Now, with the
<section>+attribute approach, we'd get something like <section
kind="attachment">: that'd would still work with the outlining
algoryth (it could be treated as generic section), it's styling will
work smoothly, etc.

Risking to state the obvious, I'll say that the name "kind" has been
used here as a placeholder for the attribute, and is definitely not
set on stone. I think both @role (despite its resemblance to XHTML2's)
or @type (the way it's used for <input>) could be quite good names,
but maybe someone has a better idea.

I'll try to post back later with a more formal and elaborate proposal
around this idea; but I think it'd be good for contributors to share
their opinions in the meanwhile ;-)

Regards,
Eduard Pascual