[whatwg] A plea to Hixie to adopt <main>

Fri Nov 16 16:01:18 PST 2012

Very few of the e-mails on this thread added new information. Please 
remember to not post to the list if you are not adding information that 
has not been previously considered -- just repeating previously provided 
information is not going to change the result, and in the meantime it 
makes it harder for people to read the list.

On Thu, 8 Nov 2012, Steve Faulkner wrote:
> >
> > The reason there is no element <main> in the HTML spec currently is 
> > that there are no use cases for it that aren't already handled
> 
> The use cases data and rationale have been provided [1].
> https://dvcs.w3.org/hg/html-extensions/raw-file/tip/maincontent/index.html

This page doesn't seem to list any use cases.

> http://www.w3.org/html/wg/wiki/User:Sfaulkne/main-usecases#Introduction

This page has the following:

| Enable users to be able to navigate to and recognise the boundaries of 
| the main content area

This is done by <main> (because of the likely authoring failures) no more 
reliably, and possibly in fact less reliably, than is already possible 
with things like <aside>.

   http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Nov/0067.html
   http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Oct/0162.html

| Enable authors to style the main content area of a page specifically.

This is already possible with <div>. It would make sense to have a more 
specific element if there was a cowpath, but there isn't:

   http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Oct/0162.html

| Enable authors to markup an area of a page as the main content area, 
| that builds on existing authoring practices

This is not a use case.

| Provide a means for browsers to map role, state and property information 
| to a HTML structure representing a common significant distinct content 
| structure.

I don't know what this means. It doesn't sound like a use case, though. 
Use cases are problems users currently face.

| Re-use an existing ARIA semantic that currently has to be bolted on by 
| authors and for which where there is already an existing relationship 
| and use with common HTML authoring practices.

This is not a use case. There's no reason to specify this role in the 
first place, if you use HTML semantics already. (If you did have to 
specify it, the entire Web would be unusable. Clearly that's not viable 
for a user agent or accessibility tool, so they have to be able to handle 
the lack of a "main" role already... and handle it well, for it's the most 
common situation to be in.)

| Bake in to HTML an existing ARIA role semantic which is interoperably 
| supported across browsers and AT and utilised in AT to provide 
| understanding and utility of an HTML content structure, to the benefit 
| of end users.

That's not a use case.

> http://lists.w3.org/Archives/Public/public-html/2012Oct/0109.html

Already responded to that here:

   http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Oct/0162.html

So again, the reason there is no element <main> in the HTML spec currently 
is that there are no use cases for it that aren't already handled.

> Agreed that people get markup wrong, I don't agree with your supposition 
> that <main> would be just as prone to mistakes as the other elements you 
> cited.

With all due respect, you have to ignore the data to come to that 
conclusion. Look at your own data: authors put this semantic all over the 
place. There is _no_ evidence that they'd do better with <main>.

   http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Oct/0162.html

> Did the year's old previous discussion take into account id value data or
> skip link data or role=main placement data?

I do not recall the specifics. ARIA didn't exist back then, so clearly it 
wasn't examined, though.

> What the relevant new data clearly indicates is that in approx 80% of 
> cases when authors identify the main area of content it is the part of 
> the content that does not include header, footer or navigation content.

Where do you get this number from?

> It also indicates that where skip links are present or role=main is used 
> their position correlates highly with the use of id values designating 
> the main content area of a page.

How do you determine this correlation? (Are you just using the word 
colloquially?)

What does this correlation mean? Are they all using both incorrectly? 
(That would get you good correlation too.)

What about pages that do not give skip links or role=main? (Pages that use 
those features are going to be disproportionally biased towards competent 
authors, which makes it dangerous to draw conclusions from that sample.)

> furthermore when ARIA role=main is used in 95% [3] of the cases in the 
> data sampled it is used once only which is a clear indicator that 
> authors get how to identify the main content area of a page.

I think that's a wildly optimistic conclusion. Lots of pages only use 
<body> once, that doesn't mean they use it correctly. :-)

> *  use of a descriptive id to value to identify the main content area of a
> web page is common.
> (id="main"|id="content"|id="
> maincontent"|id="content-main"|id="main-content"
> used on 39% of the pages in the sample [2])

As I discuss in the e-mail cited several times above, the area they 
indicate with these IDs is not reliably the "main content". For example, 
it might or might not include the footer, sidebars, navigation links, 
headings, etc.

>  * There is a strong correlation between use of role='main' on an element
> with id values of 'content' or 'main' or permutations. (when used = 101
> pages)  77% were on an element with id values of 'content' or 'main' or
> permutations.

I don't see what this tells us. Obviously if someone is going to mark an 
element as role=main, they'll use the same element for id=main. Doesn't 
mean it's correct use of role=main, nor that it's necessary use. (If they 
can use role=main, seems reasonable to expect them to use modern HTML 
elements too.)

> * There is a strong correlation between use of id values of 'content' or 
> 'main' or permutations as targets for 'skip to content'/'skip to main 
> content' links (when used = 67 pages) 78% of skip link targets # were 
> elements with id values of 'content' or 'main' or permutations.

Again, that seems obvious, but doesn't tell us anything useful.

> * There appears to be a strong correlation in the identification of 
> content areas (with id values of 'content' or 'main' or permutations.) 
> as what is described in the spec as appropriate content to be contained 
> with a <main> element

How do you determine this?

On Fri, 9 Nov 2012, Roger Hågensen wrote:
> 
> I'm wondering if maybe the following might satisfy both "camps" ?
> 
> Example1:
> <!doctype html>
> <html>
> <head>
> <title>test</title>
> </head>
>     <div>div before body</div>
>     <body>body text</body>
>     <div>div after body</div>
> </html>

This wouldn't be Web-compatible due to legacy <body> parsing behaviour.

On Sat, 10 Nov 2012, Mat Carey wrote:
> 
> Personally I would love to have a <main> element because I think there 
> is a really useful purpose; I find it much richer to use 
> <article><header/><main/><footer/></article> than 
> <article><header/><div/><footer/></article> but I have no specific 
> use-cases which are not currently supported just a general feeling that 
> we've documented a number of common idioms and this seems to be one 
> that's missing.

Why do you need an element there at all?

On Sat, 10 Nov 2012, Maciej Stachowiak wrote:
> 
> I personally think <main> would be useful. I don't think it has a huge 
> benefit, but it has modest benefits, like <aside>, <header>, <footer> 
> and <section>. I also think the implementation costs are low. The 
> reasons I think it has some benefits:
> 
> - Even though heuristics (such as the scooby-doo algorithm or even 
> guesses based on role or class, or the layout) will always be necessary 
> in some cases, it's still good to have a simple and relatively 
> trustworthy marker of the main content. This is useful both for 
> accessibility purposes and for other browser features that want to find 
> the main content. In many cases, we have found that even when semantics 
> can be heuristically inferred, having an explicit marker is still 
> useful. For example, you can usually guess that some text is an address, 
> but we still have a microformat that helps identify such data 
> unambiguously.

But we already have this. The main content is whatever content isn't 
marked up as not being main content (anything not marked up with <header>, 
<aside>, <nav>, etc).

> - From a language design perspective, it seems inelegant to identify the 
> main content solely by what it is not. I realize that this is a matter 
> of taste and that tastes may differ. By analogy, in imperative 
> programming languages that have a main function, it is generally marked 
> with as specific name rather than just by not being any of the non-main 
> functions. This is not perfectly analogous, but it still seems 
> motivating to me.

There are plenty of languages where this isn't the case. JavaScript. Bash 
scripting. Perl. Python. Lisp. BASIC. To some extent, Pascal. Assembly. 
ALGOL 68. APL. Erlang. F#. Logo. Etc.

> - The "Scooby-Doo algorithm" is not actually defined in HTML5 afaict so 
> I am not sure what the spec recommends to find the main content or which 
> elements should be excluded.

There's nothing in the spec that talks about "main content" currently. No 
implementor has asked for an algorithm to do that, and I hadn't considered 
it to be something that was needed -- it seems self-evident that the page 
is the content, and if you want to know what the content that isn't, say, 
an <aside> or <nav> block, you just exclude those from the tree.

> I presume that header, nav, footer and aside are excluded. What about 
> address? small? Arbitrary other elements with non-main ARIA landmark 
> roles?

What's the UI you're going for? I don't know that it makes sense to define 
something here. I don't know what we're defining.

> It seems insufficient to me to say that the use case of finding the main 
> content is satisfied by an algorithm that's ambiguous and not actually 
> defined anywhere. Given the state of play, authors have no way to be 
> confident that their main content can be identified correctly, and 
> implementors have no way to know how to find it.

What browser makes any attempt to identify "main content", and what do 
they mean by it?

> - I'm not confident that the sectioning elements in HTML5 exhaustively 
> cover all possible forms of non-main content.

What isn't covered?

> > This idea doesn't seem to address any pressing use-cases. I don't 
> > expect authors to use it as intended consistently enough for it to be 
> > useful in practice for things like Safari's Reader mode. You're stuck 
> > needing to use something like the Scooby-Doo algorithm most of the 
> > time anyways. I don't outright object, but I think our time would be 
> > better spent on addressing more pressing problems with the web 
> > platform.
>
> The same argument could have been made for <article>, but the 
> implementation cost was so low that the benefit didn't have to be huge. 
> I think the same applies to <main>.

The same argument _was_ made for <article>. The exact same argument. It 
was added so that you could indicate the main content. This is even 
mentioned here:

   https://developers.google.com/webmasters/state-of-the-web/2005/classes

The <main> element Steve is proposing isn't quite the same as <article>, 
because different people have different ideas of what class="main" is for, 
but <article> could just as easily have been spelt <main>. It was spelt 
<article> because that allowed it to cover some other (more useful) use 
cases as well, such as those around syndication.

On Wed, 14 Nov 2012, Silvia Pfeiffer wrote:
> On Wed, Nov 14, 2012 at 4:25 AM, Tim Leverett <zzzzbov at gmail.com> wrote:
> > > 
> > > Explicit author markup would make such a task so much easier.
> >
> > Only if every author marked up their code correctly. If some authors 
> > use incorrect markup, then an algorithm would still be necessary for 
> > determining if each usage was correct.
> 
> From a browser perspective, if there is one <main> element and it sits 
> within <body>, that would be sufficiently correct.

That's not at all clear. It depends what the use case is. If the use case 
is for helping users of ATs jump to the relevant point, and it turns out 
that pages misuse <main> a lot, or use it suboptimally a lot (like, on 20% 
of pages, as Steve suggests will happen with his stats above, though I 
don't know where they're derived from and am skeptical that it'd be even 
that good) then browsers are going to find themselves forced to find a 
better solution. This is what killed longdesc="", for instance (though in 
the case of longdesc="", the numbers were a lot worse than 20%).

> Whether it's semantically correct for a particular application, that's 
> not something the HTML spec should or could deal with.

I don't know about that, that's one of the spec's main purposes!

> We don't protect people from putting the wrong text in tags - not in 
> microdata, not in <article> or anywhere else.

We can't "protect" them, but we _can_ make it non-conforming.

On Wed, 14 Nov 2012, Tim Leverett wrote:
> 
> Pro: Adding a <main> element will provide a semantic element that 
> developers can use to indicate primary content of a document.

(But this is already possible in other ways.)

> Pro: Adding a <main> element will allow developers to use a format such as:
> <body>
>   <header />
>   <main />
>   <footer />
> </body>
> which tends to be quite clean and understandable (the easier it is to read
> code, the easier it is to fix code).

This is already possible in two ways:

 <body>
   <header />
   ...
   <footer />
 </body>

...or, if you need an element for whatever reason:

 <body>
   <header />
   <div>...</div>
   <footer />
 </body>

> Pro: Assistive technologies can use the <main> element as a means to 
> rapidly navigate to the primary content.

As mentioned above, this is already possible with existing elements, and a 
new element would only help if authors used it reliably. However, there is 
evidence to suggest that <main> wouldn't be reliably used in this way.

> Pro: The <main> element can only be used once per page. This forces the 
> author to decide exactly where the main focus of the page lies, rather 
> than relying on assumptions.

There are plenty of pages with multiple sections with main content; if we 
did introduce it, it would be once-per-section, not once-per-page.

> Con: The <main> element is supposed to exclude content that is 
> repetitious across pages, but content is often interspersed with blocks 
> of advertisements, modules, CTAs and the like.

That's one of the reasons authors are unlikely to use it reliably, yes.

On Thu, 15 Nov 2012, Ian Yang wrote:
>
> That's a good idea. We really need an element to wrap all the <p>s, 
> <ul>s, <ol>s, <figure>s, <table>s ... etc of a blog post.

That's called <article>.

On Wed, 14 Nov 2012, Steve Faulkner wrote:
> 
> The same can be said for any of the structural semantic elements, what 
> we know is that some authors mark up headings, nav, footer, articles etc 
> incorrectly or not at all.

The difference is that misusing those elements doesn't defeat their 
purpose, while misusing <main> leaves it with no reason for being. The use 
cases for <main> rely on it not being misused by the majority of authors. 
The use cases for things like <aside> benefit from it not being misused, 
but are not lost if other authors misuse them.

> What we also know is that user agents do not generally implement 
> heuristics to provide semantic information to users, they rely upon 
> explicit markup to expose semantic structures to convey meaning and 
> provide navigation of content.

The user agents that do the most to get semantics out of HTML pages, 
namely search engines, use absolutely gigantic amounts of heuristics.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'