[whatwg] Re: <section> and headings

Wed Sep 1 03:58:24 PDT 2004

On 31 Aug 2004, at 16:14, Lachlan Hunt wrote:

> James Graham wrote:
>> On 29 Aug 2004, at 16:42, Lachlan Hunt wrote:
>>> James Graham wrote:
>
>> For example, in XHTML 2, how would I generate an outline from code 
>> like:
>> <section>
>>     <h>Level 1 Heading</h>
>>     <section>
>>         <h>Level 2 Heading</h>
>>         <section>
>>             <h2>Level 2/3? Heading</h2>
>>         </section>
>>     </section>
>> </section>
>
> In order to preserve the semantics of hn elements, I would produce 
> this outline:
>
> 1. Level 1 Heading
>    2. Level 2 Heading
>    2. Level 2/3? Heading
>
> Although it may not necessarily be structurally correct for that 
> document structure, the semantics of hn elements are preserved.

Ack. So presumably:
<section>
     <h>Level 1 Heading</h>
     <section>
         <h>Level 2 Heading</h>
         <section>
             <h4>Level 2/4? Heading</h4>
         </section>
     </section>
</section>

Would have a level 4 heading with no level 3 heading? Yuck. Sure that's 
bad markup but authors use it surprisingly often.

>
>> AFAIK (and I haven't looked very closely), there's nothing in the 
>> XHTML 2 spec to say that the above code is invalid and, given the way 
>> that authors use headings at present
>
> You are correct, it is not invalid, but there is a note about not 
> using numbered headings out of order.  There should also be an 
> informative note in the spec about how they should be used in such 
> circumstances.
>
>>>  Doing this essentially says that h1 to h6 are exactly the same, 
>>> which
>>> they are not.
>> It says that h1...h6 do not define the depth of nesting in the 
>> document hierachy. They may be used in some other way e.g. to 
>> distinguish the main headline on a page.
>
> Which is different from their current definition which says that they 
> do.  Thus, your method is changing their semantics.

But it's making their semantics consistent with the way authors 
actually use HTML 4 headings. It's actually not all that consistent 
with the HTML 4 spec which states:

"A heading element briefly describes the topic of the section it 
introduces. Heading information may be used by user agents, for 
example, to construct a table of contents for a document automatically.

There are six levels of headings in HTML with H1 as the most important 
and H6 as the least. Visual browsers usually render more important 
headings in larger fonts than less important ones."

i.e. a strict reading only requires that headings denote importance not 
structure. The fact is that the HTML4 heading model is too weak to be 
reliably used for denoting document structure. This, in turn, prevents 
HTML clients offering the sort of navigational aids (such as outline 
views) available in client applications for many other document 
formats. If we regard this as a problem that needs to be solved we can 
either introduce entirely new markup that an existing legacy client 
would not recognise as a heading (e.g. the <h> element) or we can 
subtly alter the oft-abused semantics of existing elements creating a 
solution that is likely compatible with all existing UAs given the poor 
quality of heading use on the web and compatible with the most common 
use of the existing heading elements. I strongly prefer the second 
solution (and I think it's more compatible with the WHATWG philosophy).

>
>> If <section> is introduced to give structure then (as a developer of 
>> a UA-addon for creating document outlines), how should I deal
>> with documents that use both <section> ... and <h{n}>
>
> Having thought about this a little more, I think it would be 
> acceptable to define <section> as a way to semantically structure a 
> document, but to make an informative note about how heading levels 
> should not be used out of order, or nested poorly as in your examples 
> (not that it will stop people from doing it anyway).

I'm perfectly happy to have a note that mentions that headings should 
be used in-order for maximum HTML 4 compatibility.

> If <h> were added as well, then it should be noted with something like 
> the numbered headings are equivalent to the structued headings when 
> nested at that level indicated by the number.  For example.  <h2> is 
> the same as the second level <h>
>
> <body>
>     <h>Heading 1</h>
>     <section>
>         <h>Heading 2a</h>
>         <h2>Heading 2b</h2>
>         <section>
>             <h2>Heading 2c</h2>
>         </section>
>     </section>
>     <h2>Heading 2d</h2>
> </body>
>
> Headings 2a to 2d would be equivalent for a document outline; however 
> 2c and 2d (though still valid) may not be structurally correct.  The 
> document outline for that example, as determined by the semantics of 
> the elements rather than the structure alone, would be:
>
> Heading 1
>     Heading 2a
>     Heading 2b
>     Heading 2c
>     Heading 2d

I disagree that this should be the interpretation of the markup above. 
Heading 2c should be a level lower than 2b (I also think that only the 
first heading in a section should be regarded as a section title, the 
rest as subtitles. But I'm more ready to be persuaded that that's a bad 
idea)

>
>>> , and for form controls.
>>>  HTML4 already overloaded the attrbute with 10 different uses; 8 of 
>>> them being presentational, and thus deprecated.
>
> (BTW, that should have said 3, not 8 presentational uses.  I forgot to 
> look it up and fix it before sending)
>
>> So, if it's already used for a bunch of things, why not use it some 
>> more?
>
> Well, so is font.  I've seen it used to mark up headings, paragraphs, 
> lists, tables? (I'm sure someone's tried ;-D).  Why don't find some 
> more uses for it?
>
> Basically, it's because it was a oversight that has thankfully been 
> cleaned up (mostly) in XHTML2.  type is now only used for the content 
> type of an external resource or, in the case of scripts and 
> stylesheets, it's also for the element's content.
> (I do think that inconsistancy should also be fixed up, and has been 
> discussed on www-html previously, but no solution has been found yet)
>
> For backwards compatibility with HTML 4 forms type can also be used 
> for form controls, but it should not be used for any additonal 
> purposes.  In HTML4, six instances of type actually did represent 
> content types of various things, one for form controls, and the 
> remaining three were presentational hints for ul, ol and li.  The 
> presentational hints were deprecated.  So, essentially it was left 
> with content type and form control, which is how it should stay for 
> the WHAT WG specs.

Yeah, OK, overuse of attribute names is a bad thing. My proposed 'type' 
was supposed to be broadly consistent with the use in 'input' where it 
represents predefined variations on a particular semantics group of 
elements.

>
>> It's certainly been useful in extending HTML 4 forms. (In fact, I'd 
>> rather have a means of defining reserved class names. In a different 
>> context <span class="_i"> is superior to <i> and has almost all the 
>> same benefits. But I can see that going down like a lead balloon :) )
>
> Values of the class attribute have always, and will always be author 
> defined values.  No language specification should define their values. 
> (however, specifications that define a specific use for the language 
> may define what values an author can use mean.  For example, the GMPG 
> defines that an XMDP [1] is written using HTML and uses 
> class="profile", and defines what it means)
>
> Even with a syntax like "_i", there's no guarentee that an author, or 
> other spec similar to XMDP, hasn't already used, or will use and 
> define that value for other reasons.  Not only that, but <span 
> class="_i"> is not in any way superior to <i> (unless the author is 
> using "_i" for a semantic reason, rather than italics).  If anything, 
> it's worse because it just adds more characters for no reason.

The idea is that with spec-defined classes, one can have pre-defined 
presentational hints without needing presentational elements. This 
makes sense for something like italics which is commonly used for 
typographic purposes. Obviously this would require a reserved syntax 
for spec-specified classes. However, as you note, that ship sailed a 
long time ago.
>