[whatwg] Re: <section> and headings

Sun Nov 14 17:06:58 PST 2004

James Graham wrote:
>>> Well, <h> wouldn't be backwards compatible at all. At least <h1> would
>>> look like a heading of sorts.
>>
>> I give you one abbreviation: CSS. 
> 
> Sure one can make anything look like a heading. But no HTML4 UA would 
> recognise <h> as a heading whereas <h1> would, at least be considered to 
> be a heading element.

Is that important? I mean, for the user browsing the page it's about how 
the page looks, isn't it... And obviously newer browsers will understand 
<h> just fine. In addition to that, if the page author finds backwards 
compatibility a concern, he can just use the <h1>...<h6> tags, they're 
not even deprecated.

> Put another way, postulating CSS as a solution to a problem of sematics 
> is about as useful as reintoducing <font> to HTML.

I don't see how. There are semantics, it's just that the <h> tag is a 
new one. What I am saying is that you can use CSS as a means to retain 
(both visual and aural) backwards compatibility.

>>>>  <h1>Foo</h1>
>>>>  <section>
>>>>   <h3>Bar</h3>
>>>>   <h6>Quuz</h6>
>>>>  </section>
>>>>
>>>> Would be the same as H1, H2, H2, right?
>>>
>>> Yes.
>>
>> Arbitrary heading elements (1 out of 6) are incredibly verbose to 
>> express in CSS, and you'd have to place h1|h2|h3|h4|h5|h6 in any XPath 
>> expression as well. So in practice, I don't think this is a good option. 
> 
> Backwards compatibilty means that these elements have to stay whatever 
> we do. The fact that they are a pain to work with programatically is 
> true but, unfortunatley, unavoidable.

I am not saying they shouldn't. XHTML 2.0 *HAS THEM*, and even if it 
hadn't, they would still be in the spec, though perhaps deprecated.

>>> And if we don't redefine <h1> (and <h2> to <h6>), then you end up with
>>> the weird situation of having six elements which could easily be used
>>> but end up with meaningless semantics. (And they would be inline
>>> elements in legacy UAs, which is even worse.)
>>
>> XHTML 2.0 does this. Probably for well-discussed reasons, amongst 
>> others a number of concerns you raised (like the search engine thing). 
>> I don't see why it shouldn't. 
> 
> XHTML 2 has entirely different design goals to "HTML5". Specifically 
> backward compatibility is not one of these design goals. Given the 
> lengths to which many successful software products go to maintain 
> backward compatibility, there is some evidence that the XHTML2 path is a 
> mistake.

If you'd look at the facts instead of supposed design goals... They have 
kept <h1>...<h6>. They have adopted <quote> in favour of <q> which was 
suggested here as a proper solution to 'the <q> problem'.

I think it is worth investigating a possibility of adopting XHTML 2.0, 
and not discard it on beforehand based on 'design goals' without even 
looking in to it.

More than that, I think quoting XHTML 2.0's 'design goals' is a pretty 
bad argument. A better argument would be to quote a factual clash of 
XHTML 2.0 with deprecated HTML 4.01 markup without any way out (e.g. a 
little tweak with minor impact).

About XHTML letting go of its roots, I don't think it is a mistake. Some 
parts of HTML are just utterly broken. <h1>...<h6> being a good example 
of that, although ironically they didn't remove them in XHTML 2.0 :). 
<acronym> and <q> are other valid examples though. However, I think they 
should have deprecated the XHTML 1.0 Strict tags not in XHTML 2.0, 
instead of removing them altogether; only removing the deprecated in 
HTML 4 presentational stuff like font tags.

>>> e.g. at the moment, this:
>>>
>>>    <body>
>>>     <h1>A</h1>
>>>     <section>
>>>      <h2>A.1</h2>
>>>      <section>
>>>       <h3>A.1.1</h3>
>>>      </section>
>>>     </section>
>>>    </body>
>>>
>>> ...makes sense, but if we say you have to use a new element for
>>> headers, then the above is now meaningless and trying to make an
>>> outline from it would not do anything useful.
>>
>> That's just not true, or I'm missing your point.
>>
>> Try making a tree view of a document based on h1...h6 headings.
>> CSS: euh...
>> XSLT: euh... 
> 
> That can be done without too much trouble (n.b. I'm not sure what CSS 
> has to do with making a tree view). In fact tools already exist to do 
> exactly that.

CSS is an easy method to get a different 'view' of a document (e.g. 
display: none-ing all non-heading/section elements and creating a nice 
style for the heading).

>> Now try making a tree view based on h headings.
> 
> Well it's impossible unless you explicitly support HTML5 i.e. not 
> backwards compatible.

Where is the HTML 4 support for a 'tree view' then?

I have never seen a 'create TOC here' tag in HTML.

Backwards compatibility is no argument here, if you ask me. There is no 
backwards to be compatible with (except perhaps for some Firefox 
extension which few people use, and which can be changed to support <h> 
tags in no-time).

>> CSS: section { padding: 2em; border-left: 2px solid red; } 
> 
> That would work with the markup above, no?

Yes. I was not considering the section's around it :). Ops :).

>>> Basically I want three things:
>>>
>>>  1. It has to be possible to take existing markup (which correctly
>>>     uses <h1>-<h6>) and wrap the sections up with <section> (and the
>>>     other new section elements) and have it be correct markup.
>>>     Basically, allowing authors to replace <div class="section"> with
>>>     <section>, <div class="post"> with <article>, etc.
>>
>> Aside from that I don't see why when you're changing the markup anyway 
>> you would still want to retain the old headings, the XHTML 2.0 
>> solution allows for this just fine.
> 
> Beacuse you accept that you still have to deal with UAs that don't 
> support the new markup. In this case the transformation <div> -> 
> <section> is unlikely to be problematic (a non-sematic element replaced 
> with an unsupported element) whereas <hn>-<h> is a problem (a semantic 
> element replaced by a non-semantic one).

<h> is not non-semantic, that's nonsense. If you want to be consistent, 
call it unsupported if you wish. But I don't see the problem here, 
especially since changing <hn> into <h> is NOT a requirement and the 
<hn> still exist. The choice a page author makes can be based on use 
environment (internally, or web content), browser support, and further 
use of the document (e.g. transformation to PDF using XSL-FO).

>>>  2. It has to be possible to write new documents that use the section
>>>     elements and have the headers be automatically styled to the right
>>>     depth (and maybe automatically numbered, with appropriate CSS),
>>>     and yet still be readable in legacy UAs, without having to think
>>>     about old UAs. Basically, the header element has to be header-like
>>>     in old browsers.
>>
>> Let me just refer to my first (two) paragraph(s) here. 
> 
> "Basically the header element has to be header-like in old browsers". If 
> 'header-like' means anything other than 'has a heading-like appearence 
> (in which case <font size="24"> might be heading-like) you've totally 
> ignored this point.

font-size: 24px IS heading-like, as far as visuals are concerned. CSS 
can also be used to specify an aural style for it, creating the same 
resemblance for visibly impaired people not using a supporting browser.

>>>  3. It shouldn't be too easy to end up with meaningless markup when
>>>     doing either of the above. So a random <h4> in the middle of an
>>>     <h2> and an <h3> has to be defined as meaning _something_.
>>
>> This is no different than the existing spec. This would mean a 4th 
>> level heading between a second- and a third-level heading. 
> 
> HTML 4 doensn't really specify how this should work.

HTML 4.01 says:
"Some people consider skipping heading levels to be bad practice. They 
accept H1 H2 H1 while they do not accept H1 H3 H1 since the heading 
level H2 is skipped."

XHTML 2.0 says:
"The practice of skipping heading levels is considered to be bad 
practice. The series h1 h2 h1 is acceptable, while h1 h3 h1 is not, 
since the heading level h2 has been skipped."

Its use is considered bad practice ([by some]), however that is still a 
choice for the page author to make. It does not form an obstacle for UA 
implementation and the DTD/Schema allows it.

>> Inside sections one could let the section level determine the heading 
>> level and treat all headings the same
> 
> Now that I agree with.

It depends on the use. For rendering a document TOC, this would be the best.

>> , or use the highest level of either the section or the heading. I 
>> don't see a need to define this more specificly, as h1...h6 just don't 
>> go very well with sections.
> 
> In a backwards-compatible world, they have to interact somehow (if the 
> XHTML2 people haven't defined this yet they will have to or their 
> heading model will be totally broken).

XHTML 2.0 says:
"There are six levels of numbered headings in XHTML with h1 as the most 
important and h6 as the least. The visual presentation of headers can 
render more important headings in larger fonts than less important ones.

Structured headings use the single h element, in combination with the 
section element to indicate the structure of the document, and the 
nesting of the sections indicates the importance of the heading. The 
heading for the section is the one that is a child of the section element."

So it considers h elements more structured.

But, as you obviously barely looked at XHTML 2.0, why not study the spec 
a little to see whether it would make sense to adopt it and point out 
specific huge-problem-areas if you think it doesn't, instead of blindly 
argueing against it with what looks like a huge lot of prejudice to me..

>> That's the way it is, and it won't really harm anyone. 
> 
> Except anyone trying, say, to create a tree view of a document. Other 
> document formats allow tree views to be constructed. Saying that this 
> should be impossible in HTML seems rather shortsighted. There are other 
> types of UAs that want to know about headings too. Searchbots are an 
> obvious example.

If you know about 'HTML 5', there is no problem with constructing a tree 
view. See my comment with the Firefox extension.

For searchbots, I already mentioned them and said that if page authors 
consider that a problem they can use <h1>...<h6>. However, there are 
many cases where using <h> instead would not be really problematic, for 
example for pages which don't care about Google rankings as much (many 
non-commercial / amateur sites come to mind, and besides, Google's 
ranking is mainly based on the 'pagerank' technology and only in a 
lesser amount on stuff like headings).

>>> At the moment what I'm thinking of doing is this (most of these ideas
>>> are in the draft at the moment, but mostly in contradictory ways):
>>
>> [...]
>>
>> I think this is all needlessly complicated.
>>
>> Note that for navigation XHTML 2.0 has <nl> Navigation Lists, which 
>> would correspond to your <navigation> tag. 
> 
>> A sidebar (which side?
> 
> Can you say 'not mixing presentation with content'

I'd say 'sidebar' by itself is presentation.

>> how is it different from navigation and why is navigation not a sidebar?)
> 
> Because a sidebar, typically, isn't something that contains navigation. 
> It is a piece of content that is related to the main text but not in the 
> flow of the main text. The spec needs to make this clear.

I see what you mean. I wouldn't call that a 'sidebar' though. That is 
associated with presentational stuff.

I believe there is another term for that, which I don't seem to recall 
at the moment. Darn :/. Anyone into magazine publishing here who can 
give me a hint? ^_^

Also, would there really be a need to declare that semantically? 
Wouldn't a <section class="sidebar"> be better? Note that I not disagree 
with the tag per se (though I would rather see a different name for it), 
it is just a thought.

>>> An alternative would be to ask the CSS working group for an :or()
>>> selector of sorts, and then have:
>>>
>>>    :or(section, article, sidebar, navigation) h1 { /* h2 */ }
>>>
>>>    :or(section, article, sidebar, navigation) h1
>>>    :or(section, article, sidebar, navigation) h1 { /* h3 */ }
>>>
>>>    :or(section, article, sidebar, navigation) h1
>>>    :or(section, article, sidebar, navigation) h1
>>>    :or(section, article, sidebar, navigation) h1 { /* h4 */ }
>>>
>>> That might work.

Note that section h1, article h1, sidebar h1, navigation h1 would be 
just as verbose (even less verbose if you'd add multiple levels of 
sections/articles/sidebars/navigations) and wouldn't require new CSS 
selectors.

Nevertheless, I'd say an :or pseudo-selector (like XSLT's |) would be a 
viable addition to CSS.

>> Woohoo.
>>
>> Note the amount of sarcasm in my voice, which can unfortunately not be 
>> transferred through this medium (well, I guess I could include some 
>> XVoice markup :)). Just use <section> with <h> headings and <nl> with 
>> <label> headings. 
> 
> But then, what does
> <section>
> <h4>Heading</h4>
> </section>
> 
> mean? It varies according to which UA you ask - a HTML4 UA would report 
> a single heading, a "HTML5" UA would not.

What? Of course a HTML 5 UA would 'report' (how, to whom, what do you 
mean with report?) a single heading, because there is only one in the 
document.

>>>>> I don't disagree. But it is backwards compatible.
>>>>
>>>> Not really. If search engines don't get upgraded to support this new
>>>> kind of H1 semantic all kinds of documents can be indexed wrong or
>>>> they can be marked inappropriate because they mis-use the H1 element
>>>> in the eyes of the search engine. (The same as with creating a page
>>>> full of links, but now you are mis-using a heading element.)
>>>
>>> You are assuming that search engines trust authors to use <h1>
>>> elements correctly in the first place, and, more importantly, that
>>> they treat them differently to <h2> elements in a way that would be
>>> noticeable if this became widespread.
>>>
>>> I highly doubt this.
>>>
>>> Also, using <h> would have the same problem in reverse -- content
>>> would no longer be indexed as a header at all.
>>
>> That is up to the site author to decide, isn't it. Not all content 
>> needs a high search rank, and not all content is used on the web. I 
>> also think it is a slight adjustment for e.g. Google to make to their 
>> engine, so who knows they will.
> 
> Who knows indeed. The point of being backwards compatible is that people 
> don't have to run the risk that product X will not be updated to the new 
> requirements. Seriously, how many sites will use the new markup if they 
> believe that it might decrease their search ranking (bearing in mind 
> that Google is quite secretive about such things).

I would :).

People who intend to do more sophisticated stuff with their XHTML (like 
XSLT to generate TOC's and XSL-FO transformations to create PDF's) would.

People who would actually like to write XHTML 2.0 but be able to benefit 
from HTML 5 support and compatibility would :).

>> At least if you don't try, you can be sure they never will. In any 
>> case h1...h6 would not be deprecated so there is no reason not to use 
>> them if you want to. 
> 
> But how would they interact with <section>? That's the question, no? I 
> feel I'm missing something here...

I'd say that's indeed a bit ambiguous.

However, wanting to give a clarification for this wouldn't obstruct the 
adoption of <h>, which I am trying to make a case for (in the context of 
adopting XHTML 2.0 as a whole).

As far as <section> is concerned, <h1>...<h6> are just headings inside a 
section. They could be treated like a <h> tag by e.g. a TOC generating 
transformation. This is quite logical, because the whole problem is that 
'HTML 4 headings' are not attached to the document text, and the best 
approach to solve that is to assume all the text inside the same 
<section> (or in HTML 4 terms, until the next <h1>...<h6> heading, which 
is obviously a quite fragile assumption) belongs to it. However, it all 
also depends on how the author intends. After all, why on earth would an 
author create headings which don't match the sections :).

When rendering, they should be rendered just as they currently are in 
HTML 4.01.

Basically <section> and numbered headings don't go well together and 
that's why it's so important to have a really unambiguous tag like <h> 
available.

>>> The other advantage of using the existing <hX> elements is that
>>> Assistive Technologies will continue working, reporting the section
>>> headers, instead of saying there are no headers on the page.
>>
>> Assistive Technologies don't work on pages using headers created with 
>> font tags or styled divs either. Assistive technologies can be updated.
> 
> can != will be
> In fact, a faliure to work with existing technologies might be enough of 
> a barrier to adoption that people avoid "HTML5" at-all so products are 
> never updated to work with it.

Sigh. You keep on coming back to this. I NEVER said elements had to be 
removed from the spec. What's more, I explicitly stated that some of 
them should be deprecated (<h1>...<h6> not included, and I'd say <a> and 
<img> also shouldn't be), but remain in the spec ESPECIALLY because of 
backwards compatibility. I also stated that this would be an important 
difference and *merit* over XHTML 2.0.

However, over time, as support for the non-HTML 4 compatible tags grows, 
people will at least have a choice to start using the new stuff. And in 
a controlled environment such as intranet or private use this may be as 
soon as a major browser (e.g. Mozilla) adds support for it.

~Grauw

-- 
Ushiko-san! Kimi wa doushite, Ushiko-san!!