[whatwg] Don't change the semantics of elements

Fri May 11 05:54:55 PDT 2007

In response to  
http://www.autisticcuckoo.net/archive.php?id=2007/05/09/forward-towards-the-past  
I contacted Tommy Olsson to discuss the issue further, and we agreed to  
forward the discussion to the list. I've translated it from swedish so any  
grammatical errors are my fault. :-)

-------------8<---------------

It seems like the premises of the working group are something completely  
different from what I personally would have wished for further development  
of one of the most important standards on the web. That is yet another  
reason I don't want to be involved in the game. :)

> In the article you write:
>
>    Instead, it looks as if they are going to redefine the semantics of
>    existing element types so that old-school documents from the Bad Old
>    Days will be conforming to the new specification!
>
> Hmm. It's probably more on a case-to-case basis.

One single such case is bad enough, in my opinion. W3C have already done  
this, and we don't quite know what the result will be. They changed the  
semantics of DL from definition list to some kind of generic value pair  
list. For instance, they say that DL can be used for markup up dialogues.  
How will that affect an application that has relied on that a DL is a  
definition list? One example is the DEFINE: feature in Google.

In my opinion it would have been considerably much better if they let DL  
continue be a definition list and instead added a new element type for  
value pair lists. Although I suspect that Microsoft was holding it back in  
that respect, by not wanting to put any work into development of IE. (This  
was before Firefox's successes forced them to do so anyway.)

> When it comes to which semantic <p> should have you first need to ask the
> question who will benefit from the definition of <p>? The one who authors
> HTML by hand? The one who implements a WYSIWYG editor? Above mentioned
> analysis applications? Browser manufacturers? Several/all of them?

This is where I see the line between my point of view and the working  
group's. You look at the benefit of each specific definition, which I  
think means that you miss the forest for all the trees.

My standpoint is based on that I learned HTML during the time the specs  
were at cern.ch. It was before the W3C was grounded and before HTML got  
any version number. HTML was then a semantic markup language that was very  
biased towards scientific documents, for natural reasons.

It became natural for me to think of HTML elements from a semantic point  
of view. The element type has nothing to do with presentation, but shall  
only mark up what things are. Unfortunately the range of semantic element  
types is very limited, but at least we can mark up headings, paragraphs,  
lists and tables.

The web's development during the second half of the 90's went in a totally  
different direction, when designers and happy amateurs took over. I  
thought it went off the track, since HTML for me wasn't a presentational  
language. The W3C agreed, and eventually released CSS, but the damage was  
already made.

To me the question of who "benefits" from the definition of is 
irrelevant. The definition already exists and is unambigous. A tag 
shall mark up a textual paragraph, and nothing else. Then of course there 
are gray areas: is a byline a special case of a paragraph, or something 
else?

We look at the overall picture from completely different perspectives, and 
I don't think we will reach a common vision. Your outlook is probably in a 
vast majority; 99.999% of those who create web pages have never even read 
the HTML4 specification, after all, but sees HTML from a presentational 
aspect. For them is probably just a <div> with predefined margins, 
just like the HTML5 specification seems to suggest.

> Webbläsare kan inte göra så mycket skoj med <p>. Oavsett vad specen säger
> att en <p> representerar.

But the web doesn't just about browsers. The web is (or should be, anyway)  
about publishing information. One way to take part of this information is  
to present the documetn in a browser, but it's far from the only  
conceivable manner. Of you think a bit forwards and have some imagination  
you can probably come up with many interesting areas of use for  
information on the web, presuming that it is marked up in a sensible way.  
For the semantic web, that the W3C is talking about, is something I find  
clearly attractive.

> Analysing applications that operate on the entire web without prior
> agreement with the producer cannot rely on that <p> == paragraph, because
> the web doesn't look like that and we can't change it. Regardless of what
> the spec says such apps will thus have to implement heuristics in order
> to decide what is a paragraph. (If there is a prior agreement with the
> producer it still doesn't matter what the spec says.)

It depends. An analysing application that tries to create some sort of  
sense of today's tag soup has a strong sysifos work in front of it. But an  
analysis application that expects semantic correctness would, if it became  
popular, be able to affect things in the right direction. Today's SEO  
trend has to some degree lead to better understanding for semantics, e.g.  
by spiders rewarding correctly marked up headings before tag soup with  
FONT and B elements.

> A WYSIWYG editor probably has a hard time knowing whether what the user
> writes is a paragraph or not.

Yes, I have so far not seen one WYSIWYG editor that facilitates semantic  
correctness. I also can't imagine how such a user interface would look  
like. But surely there should be wiser minds in this world that can come  
up with something?

> From that point of view it doesn't really matter how <p> is defined in
> the spec -- it doesn't change reality

No, I think it matters a lot. For those who don't read the spec (i.e.  
99.999%) it obviously has no significanse at all, but there has to be an  
unambigous semantic definition for each element type for the little  
minority who actually want to do things right.

> Then the question is what is the harm that <p> is used by more things
> than just for paragraph. Who is harmed by markup such as
>    <form><p><label>Search: <input name="q"></label></p></form>

The one who has read the earlier HTML specifications and thinks that 
marks up a textual paragraph. Obviously not the one who looks at the 
result in a graphical browser, but maybe the one who uses a completely 
different UA.

Sure you can hit in screws with a hammer. There won't come a SWAT team  
with murderous carpenters and drag you away to the prison for that. But  
those with a little piece of pride of his profession still uses a chisel  
or a screw driver.

> ...? Why is
>
>    <form><div><label>Search: <input name="q"></label></div></form>
>
> better?

It's only marginally better, by using a semantically neutral container 
instead of abusing . The correct thing is naturally to use a <fieldset>.

This with semantic meaning and correct markup is hard to mediate. I notice  
that daily both at my job and on forums such as SitePoint. The visual  
outlook ("the most important thing is that it looks good") is completely  
dominating before the structual ("it shall be correct too").

I don't imagine that the world will get a collective aha experience and  
that HTML in the future will get used the way it was intended. But it  
doesn't stop me from at least preaching now and then for those who are  
interested. I can't save the web from the tag soup march, but I might be  
able to save a handful of people from getting stuck.

Let me just clarify that this isn't about me being conservative and  
opponent to changes. I don't grumble about that "it was better before" and  
sniff at "the youth of today". Possibly you can draw similarities to  
authors of letters to the editor column who sign their works with "friend  
of order". :)

It's simply that I happen to think that the original idea with HTML is a 
good one. Let HTML mark up structure and sematnics, and leave all 
presentation to CSS. To further develop HTML, add more semantic elements 
that experience shows we need; such as <nl> (navigaion list) and an 
element type for value pair lists.

/Tommy

------------->8---------------

Regards,
-- 
Simon Pieters