[whatwg] WA1 - The Section Header Problem

Fri Nov 19 08:52:25 PST 2004

James Graham wrote:
> Matthew Raymond wrote:
>> James Graham wrote:
>>> Giving UAs options tends to lead to non-interoperable solutions.
>>
>>    Why do outlines build entirely by user agents have to interoperate?
> 
> Because one "User Agent" might be an editor which automatically inserts 
> a table of contents whilst the other might be a browser which shows an 
> outline, for example.

    How is that every different from right now, except that you're 
throwing <section> into the mix? Since I define that <section> structure 
must not be overridden, it sounds more like you want a greater 
definition of how the header elements themselves can be used to create 
an outline.

>> Furthermore, I'm not giving the user agents anything they didn't 
>> already have. I'm simply defining a way vendors can continue using 
>> <h1>-<h6> as they were using it before without breaking the structure 
>> created by <section> elements. 
> 
> Vendors of what?

    Vendors in my above statement are people who create user agents.

> And breaking the structure how?

    As previously stated, user agents would not be able to process 
header elements in such a way that they alter the structure created by 
the <section> elements. If the user agent vendor so chooses, the UA can 
append to that structure, but not alter it.

 > I don't see how a
> clearly defined relationship between <hn> and <section> will break 
> anything.

    Are you kidding me? Look at this:

| <section>
|  <h1>Header A</h1>
|  <section>
|   <h2>Header B</h2>
|   <h1>Header C</h1>
|  </section>
| </section>

    If you only look at the the headers, you have this structure:

Header A
+- Header B
Header C

    This violates the structure of the <section> elements. Therefore, 
the example above can only have the following structure:

Header A
|- Header B
+- Header C

    Now, let's change that example a bit:

| <section>
|  <h1>Header A</h1>
|  <section>
|   <h1>Header B</h1>
|   <h2>Header C</h2>
|  </section>
| </section>

    According to the header information alone, the structure could look 
like this:

Header A
Header B
+- Header C

    This violates the <section> structure, which looks like this:

Header A
|- Header B
+- Header C

    What my model lets user agents do is interpret the headers in a way 
that preserves the existing structure, but can also append to it, so the 
second markup example above could, if the UA vendor so chooses, yield 
this structure:

Header A
+- Header B
    +- Header C

>> > If they don't know that why have any spec at all?
>>
>>    HTML 4.01, with regards to headers, has virtually no specification. 
>> this is the entire specification of it:
>>
>> "Heading information may be used by user agents, for example, to 
>> construct a table of contents for a document automatically."
> 
> I'm aware of the limitations of HTML 4 on this issue. That doesn't mean 
> we should do a bunch of stuff to make headings work properly and then 
> throw away that progress at the last minute "because HTML 4 got this 
> wrong".

    You're missing the point. HTML doesn't even define what "right" and 
"wrong" are. Therefore, in theory, we have no way of knowing how user 
agents may interpret headers. I'm simply suggesting that we go from a 
"do what you want" model to a "do what you want, but don't break the 
sections" model.

>>> What do you mean by "structual semantics" other than the ability of a 
>>> UA to infer an document structure, of which an outline is a 
>>> representation.  The whole concept of semantics is only useful 
>>> insofar as
>>> it allows UAs to do something and in the case of structure the 
>>> obvious thing for a UA to do is present the structure. What's the 
>>> point of speccing up a robust structural model if we don't specify 
>>> how one can infer structure from the model? You seem to be 
>>> sidestepping the problem rather than solving it.
>>
>>    I'm not sure what you're getting at here. The <hn> elements, as 
>> defined in HTML 4.01, don't have any semantic meaning related to 
>> structure.
> 
> Yes they do. They head (unmarked) sections of the document. It's poorly 
> defined in the spec how those document sections are related to each 
> other but, given the markup in the spec itself, we can infer how HTML 4 
> headings are supposed to be used.

    Not really. The example given uses <div> elements to create 
structure in the markup. From that example, you could make a better 
argument that <div> equals <section>. Allow me to quote:

    Let's format this in XHTML and replace "div" with "section":

    In the HTML 4.01 spec example, you don't even need importance levels 
to determine the document structure. In fact, from the example, there is 
the suggestion that <div> elements are required to associate the headers 
with their respective content, which is probably not a road we want to 
go down.

>> In fact, there is no definition in the specification of how to 
>> associate headers with content or even how they relate to each other. 
>> All I'm attempting to do is say that if vendors decide to make up 
>> their own set of rules regarding how to use that header information, 
>> those rules can't violate the structure created by the <section> 
>> elements. 
> 
> Why are vendors going to be making up their own rules at all. I really 
> don't understand what that gives anyone. Vendors, authors and users are 
> all better off if there is a single, working, header model (and, in 
> general, if the spec defines expected behavior as far as possible).

    Let's say you have a document with the following contents in the <body>:

| <h1>Heading A</h1>
| <p>...content...</p>
| <h2>Heading A.1</h1>
| <p>...content...</p>
| <h3>Heading A.1.1</h1>
| <p>...content...</p>
| <h2>Heading A.2</h1>
| <p>...content...</p>
| <h3>Heading A.2.1</h1>
| <p>...content...</p>

    Now let's say you copy the entire block into section in another 
document:

| <section>
|  <h1>Heading A</h1>
|  <p>...content...</p>
|  <h2>Heading A.1</h1>
|  <p>...content...</p>
|  <h3>Heading A.1.1</h1>
|  <p>...content...</p>
|  <h2>Heading A.2</h1>
|  <p>...content...</p>
|  <h3>Heading A.2.1</h1>
|  <p>...content...</p>
| </section>

    If we define <hn> as having no structure inside a <section>, we 
loose what structure there was in the first document and force a 
structural rewrite of the markup. (Keep in mind, I use Notepad for my 
HTML, so the argument of letting the editor convert the code for you is 
a bit weak for me.) If you define specific rules for how headings work 
within a section, and those rules don't apply outside a <section>, you 
create confusion for webmasters. If you apply new rules to headings in a 
general manner, you are likely to break the way existing HTML 4.01 user 
agents handle the markup.

    UA vendors have already made up their own rules in the absence of 
good spec writing. The only question is whether we try to accommodate 
those rules or break them.

>>    I don't see a point to this limitation. In my model, an <hn> 
>> element renders the same regardless of where it's used, unless it's 
>> used as a tab label, and <h> = <h[level]> with regards to both 
>> semantics and presentation. Are you referring to levels greater than 
>> six? We can just make the default presentation for a header with a 
>> level greater than six the same as <h6>. Actually, correct me if I'm 
>> wrong, but would this work for that:
>>
>> |   h1, section[level=1] h { /* H1 styling */ }
>> |   h2, section[level=2] h { /* H2 styling */ }
>> |   h3, section[level=3] h { /* H3 styling */ }
>> |   h4, section[level=4] h { /* H4 styling */ }
>> |   h5, section[level=5] h { /* H5 styling */ }
>> |   h6, section[level=6] h { /* H6 styling */ }
>>
>>    Well, in theory, some jerk could do <section level="7">, I suppose. 
>> I wish you could do this in CSS:
>>
>> |   h6, section[level>=6] h { /* H6 styling */ } 
> 
> As other people have pointed out, the point of allowing nested 
> <sections> is that you don't have to do this. Besides, what does:
> <section level="25">
> <section level="3">
> </section>
> </section>
> 
> mean?

    It means the subsection is more important than the section. For 
instance, I may think that SGML is of trivial importance and that HTML 
is far more important, but since HTML is a child of SGML and not the 
other way around, HTML ends up a child of SGML.

 > This idea seems to reproduce all the problems of the HTML4 heading
> model.

    If you mean that it preserves the HTML4 heading model as much as 
possible for backwards compatibility, then yes, it does.

 > And I'm still trying to work out what the real advantage would be.

    Here's some advantages:

1) Legacy header markup renders in an HTML5 UAs in the same way it would 
in the HTML4 version of the same UA.

2) Markup is rendered and processes in the same way by the UA regardless 
of whether it's inside a <section> element or not.

3) Cutting and pasting legacy header content doesn't change how it's 
processed unless it directly conflicts with the <section> structure.