[whatwg] [html5] Semantic elements and spec complexity

Wed Nov 10 09:24:45 PST 2004

Matthew Thomas wrote:

> On 10 Nov, 2004, at 3:48 AM, Ian Hickson wrote:
>
>> ...
>> The whiteboard in my office currently has a list of elements under 
>> the heading "HTML5 BLOCK LEVEL ELEMENTS", and I'm trying to work out 
>> how to make them work well (the elements in question are currently 
>> mentioned in the draft, but the draft doesn't handle headers at all 
>> well). I haven't looked at inline markup yet, but that's on the cards 
>> too.
>
>
> I believe the past 15 years of semantic markup have shown these three 
> things to be true:
>
> 1.  Most authors Just Don't Care about semantic markup.

This is true.

> They'll only use
>     it if it's the easiest way of getting the visual effect or behavior
>     they want in their own favorite browser, or if they can use it to
>     game search engines. (That's why authors use <ul> and <li>, for
>     example, but not <address>.) 

This isn't really true, in general. For example, many sites fail to use 
headings (and use odd combinations of <font> and <b> and <big> to get 
the same effect) even though headings are the easiest way to get 
something that looks kindof like a heading. In fact, unwanted styling on 
elements has an adverse effect on their use (e.g. I have heard people 
say "h1 shouldn't be used because the font is too big").

>
> 2.  Those authors who do care about semantic markup often do so
>     overzealously, using it even when it's not appropriate. For example,
>     they use <em> whenever they want italics or <strong> whenever they
>     want bold.

This is true. However it only applies when there is a 1:1 mapping 
between a 'visual' and a 'semantic' element. The net effect is that, 
from the point of view of UAs non-semantic elements are treated like the 
sematic ones (lynx renders both <i> and <em> in purple despite the fact 
that purple != italic).

>
> 3.  The more complex a markup language, the fewer people understand it,
>     the less conformant the average article will be, so the less useful
>     the Web's semantics will be. Current HTML authors may clamour for
>     new features, but they have forgotten what it was like to be a new
>     HTML author; and new authors are neither subscribed to this list nor
>     employed by browser vendors, so it is easy to forget about them. 

This is sort of true although tutorials tend to focus on the common 
elements <p>, <ul>, <br> etc.

>
> So if <section>, <navigation>, <header>, <footer>, <article>, and 
> <sidebar> are introduced, with the default presentation currently 
> suggested {display: block; margin: 0;}, I predict the following:
>
> *   A greater number of Web developers will never use most of these
>     elements, but they will replace all occurrences of <div> on their
>     pages with <section> because it's more "semantic" (just like they
>     did with <em> for <i> and <strong> for <b>), and they will feel good
>     about it. 

This could be a problem. In principle, the 'visual difference' should be 
the link between <section> and headings. That might not be enough. Did I 
already suggest an official authoring guide for the spec that would 
explain the difference between <div> and <section> (and other such 
issues) in a more approachable way than the spec itself can? If not, it 
seems like a good idea.

> *   The vast majority of article producers (Weblogs and online
>     newspapers) will never use <article>, because there's no visual or
>     behavioral benefit from doing so. So <article> will never become a
>     reliable way of dissecting or aggregating pages. 

Again, I think this could be a problem.

> *   The number of knowledgable HTML authors, the proportion of HTML
>     pages that are valid, and therefore the overall usefulness of the
>     Web, will be less than it otherwise would have been because of
>     HTML's increased complexity. 

I tend to disagree here. It's not like there are hundreds of pages using 
<cite> where they want <i>. A few features will be abused and so won't 
be useful. The remainer will be infrequently used but usually used 
correctly.

> One way of improving this situation would be to reduce the number of 
> new elements -- forget about <article> and <footer>, for example.
>
> Another way would be to recommend more distinct default presentation 
> for each of the elements -- for example, default <article> to having a 
> drop cap, 

Hmm

> default <sidebar> to floating right

+1

> , default <header>, <footer>, and <navigation> to having a slightly 
> darker background than their parent element

It seems like there should be something more obvious that could be done 
for these elements. For <header> and <footer> a border below and above, 
respectivly, would seem obvious.

> , and default <header>...<li> and <footer>...</li> to inline 
> presentation. This would make authors more likely to choose the 
> appropriate element. 

An interesting side effect might be users who didn't understand enough 
CSS to turn these effects off eschewing the new elements entirely. Which 
might be a good thing. Or might not.

>
> A complementary long-term approach would be to deprecate the most 
> redundant and/or least effectual elements and attributes from HTML 
> 4.01 -- for example, <acronym>, <big>, <small>, <q>, <var>, 
> accesskey=, cite=, longdesc=, and name= -- in preparation for removing 
> them. This would eventually help reduce the complexity of the spec.

+1 in principle, but I'd argue a little with the list of things to 
deprecate :)