[whatwg] ---

Wed Nov 5 16:32:57 PST 2008

First of all, I want to apologize. I'm quite afraid that the explosion
of frustration and disappointment on my last message to this list was
one of the triggers (if not the only or main one) igniting the
conflict here. I'm really sorry for that: my only intention when
joined this list was to contributing into making of HTML5 the best it
can be, for web users, content authors, browser and tool implementers,
and all other affected parties; and that message didn't help this
intention at all.

Before going on, I want to make clear that most of what I express in
my mails is an opinion or point of view. I, just like anyone else, may
be wrong at any time; and I'm more than willing to accept I am when
I'm shown some sustainable evidence or argument proving so. If any of
you thinks I'm wrong, on whatever I might say or have said, just let
me know, and I'll listen (or read) you.
Yet the source of my frustration didn't come from being proven wrong
our finding disagreement on some or many suggestions; but on the fact
that most of the threads I tried to participate in ended up being
ignored or stepping into unrelated side-topics. I'm quite willing to
make one more effort to assume good faith, and to take the assumption
that my messages went unnoticed, or that I failed to express myself so
miserably that nobody was able to get my point, or felt into oblivion
for any other unintentional reason; rather than thinking that I'm
being deliberatelly ignored. (I might add as well that a cold shower
really helps with that :P ).

Now, let me add that my opinions are not based on just impulsive
thoughts; but on over five years of professional experience as a
webmaster, web application developer, and SEO specialist; plus roughly
five more as a hobbyist "web tinkerer" (ie: setting up dynamic
websites just for the sake of it, with no other purpose than seeing
how far could I go before messing things up). I've been also into
programming since I was 8; so even if I don't write browsers and
authoring tools myself, I normally have a quite approximate idea of
how easy/hard something could be to implement. I perfectly know that
some people here will have more and deeper experience than me; yet I
still think that I have enough background to at least contribute
something useful.

My first opinion on the current discussion is that there is too much
going on. If just an off-topic side comment has been enough to lead
some entire discussions astray, I don't think it is possible at all to
have a rational and/or useful discussion about so many topics at once.
I'd like to (briefly) reply to some of the comments posted on this
discussion, and even add a few of my own; but if somebody feels
anything of what I say here is worth replying to, then it's probably
worth updating the subject to something relevant, splitting this
thread into more focused dicussions.

To Pentasis's comment "Who ever said that the standards are here for
browsers?" and all the replies to it:
There are several facts here to keep in mind. First of all, there is a
relevant collective among web authors (I can't say how big is it,
relative to the entire web authors collective), who simply don't trust
browser vendors. And there are quite good reasons for that:
Microsoft and Netscape literally *negotiated* HTML3.2, with horrible
consequences that we are still suffering over a decade after.
Microsoft has single-handedly boicoted the propper adoption of XHTML1
with IE's obnoxious treatment of the "application/xhtml+xml" MIMEtype
(and, whether you like it or not, there are some cases where draconian
error handling is a feature rather than a drawback, added to the
extensibility mechanisms XML offers). There has never been an HTML vs.
XHTML debate: IE took the choice away, forcing those who tried to use
XHTML to do a lot of extra effort, and it's quite obvious that the
affected authors didn't like that too much.
Microsoft has also stagnated the evolution of the Web, by deciding to
take over a decade before decently implementing CSS2. All those
authors who were eager to take the most of the new CSS when it was
published are quite pissed off by the fact that a single vendor denied
them all these new features, for no other reason than lazyness and the
commodity of having the market under a monopoly.
These are the most obvious and sound; and I hope people from other
sectors can get an idea from here about why there are so many
developers that are quite exceptic towards browser vendors choices and
claims.
Now, we have to add the second fact to the mix: the WHATWG group is
mostly made of representatives of browser makers (although Google has
recently stepped into the sector with its Chrome browser; I think Ian
shouldn't be looked at as another browser maker representative; but
still we should keep in mind that it's up to them to replace him with
another editor if they feel like it). Top it up with the extra fact
that the spec is copyrighted by three of these browser vendors. For
those of us who don't trust browser vendors this is, in the best case,
scary.
Hence, don't expect web authors (at least this subset of them) to
blindly trust the vendor-centric WHATWG (by vendor-centric I simply
mean that it's composed of browser vendors at its core) from the
beginning. By joining this list, providing our feedback, and sharing
our opinions and PoV, we are giving the group a chance to earn that
trust. How does the group use that chance is up to their members.
Of course, we are quite aware that browser vendors have the final say
on what do they implement; or at least they think so... But content
authors have the final say on what do we use to mark up our documents;
and currently we do have a choice: it is possible to perfectly render
XHTML2 pages on all currently used browsers (although IE up to 7 can
be a bit tricky, due to the lack of CSS2 support).
Someone (I won't name that person, because it was a private
conversation) told me that if HTML5 didn't meet the requirements of
browsers, at the end browsers would implement something else; the same
way XHTML2 didn't meet those requirements and browsers aren't going to
implement it (actually, by implementing XML, namespaces, and XSLT,
they are already implementing enough of XHTML2, but that's a separate
point). The same reasoning applies to content authors: if HTML5
doesn't meet authoring requirements, authors will end up authoring
with something else. Actually, to put a specific example, there is
only one issue that keeps me from using XHTML2 for my current website
project (and it's completely unrelated to browsers); and it would
currently be the best option: it serves my needs better than XHTML1.x
or HTML4.x; and HTML5 isn't mature enough yet to be even taken into
consideration for that site.
However, the main reason why I joined the lists is that I think HTML5
has the potential to beat (read: become much better) than XHTML2.

Enough of that; let's go to the next point:

On Wed, Nov 5, 2008 at 10:11 AM, Henri Sivonen <hsivonen at iki.fi> wrote:
> On Nov 5, 2008, at 10:46, Pentasis wrote:
>
>> <var> is the best example I think. Why <var> but not <function> <operator>
>> <operand> etc. etc. etc.? And if code gets this attention why not language?
>> (<verb>, <noun> etc. etc.) If we do it like that it would never work.
>
> <var>, <cite> and <dfn> (and, one might argue, <em>) are legacy elements
> flowing out of a desire to replace <i> with something "semantic".
>
> Since the elements are part of the HTML legacy, there isn't a great
> rationale that would justify their inclusion today if they had never been in
> HTML and were proposed as new elements now.
This makes me wonder: is the backwards compatibility topic being dealt
appropriately? For example, why keep <var> (and others), but drop
<big>? Why don't keep <font> as well? It is part of the HTML legacy,
after all, and a quite large part if you look at the markup of
currently existing documents (I'd bet that it's among the three most
used elements in the current web, sharing the podium with <p> and <a>,
but can't say for sure).
I think following HTML4's and XHTML1's approach and having
Transitional and Strict flavors wouldn't be a bad idea (I don't know
if the Frameset one would still be needed: <table> + <iframe>; or even
<iframe> + CSS's display: table-cell; seems quite cleaner, more
flexible, and doesn't require authors to use two separate content
models for similar stuff). Browsers wouldn't need to care at all when
using their "tag-soup" parsers; and it would be just a matter of
feeding one DTD/schema or the other when using an XML parser. It would
allow separating "obsolete" stuff that is only kept for backwards
compatibility from really structural stuff.
The impact of doing this (ordered from the worst to the best side effects):
Validator implementers would face quite a deal of extra work, since
they'd need to validate for different kinds of document (namely,
"Transitional soup", "Strict soup", "Transitional XML", and "Strict
XML").
Spec writers will have to properly define the flavours. On this early
stage, it could be enough to mark the appropriate stuff in the spec as
"transitional"; and then writting the DTD's or similar formalizations
once the content model becomes stable enough.
Authoring Tool implementers wouldn't face too much issues: if they
already make a distinction between HTML4's / XHTML1's flavors, reusing
most of the code should be quite doable. For those that don't separate
flavours, simply don't expect them to start now :P.
Browser implementers would be trivially affected: they'd just need to
incorporate both flavors of the DTD/Schema for XML parsing, and at
much add a bit of logics to ensure the appropriate one is fed to the
parser (but browsers are supposed to already be doing this when they
are dealing with XHTML1, so it shouldn't be an issue).
Authors who chose Strict doctypes would enjoy a succint, efficient,
and non-bloated language allowing them to conciselly and consistently
mark up their documents.
As soon as different kinds of UAs (including browsers, assistive
technologies, and search engines, among others) become aware (read:
are updated) of the new markup stuff, users will enjoy a wide variety
of benefits. Better bookmarking, smarter hints by assistive
technologies, and more representative snippets in SERPs are the first
ones that come to my mind.

Next point:
On Wed, Nov 5, 2008 at 10:22 AM, Markus Ernst <derernst at gmx.ch> wrote:
> Pentasis schrieb:
>> [...]
>> First of all, I want to make it absolutely clear that these ideas are
>> strictly dealing with context and semantics. I do not wish to interfere in
>> the technical part of the spec. I do understand that sometimes there are
>> ideas that may involve technical solutions. My first and foremost concern is
>> about having a specification that deals with the naming of elements and
>> their usage in such a way that this would give us a standard which will
>> enable us to markup content consistantly and flexibally without ambiguity,
>> and which is flexible enough to act on-the-fly (so we don't have to wait for
>> the next version of the spec if something is missing).
[...]
> If I understand you correctly, you suggest a very basic set of structural
> elements, which are to be flexibally qualified by the authors via the class
> attribute. The composition of that set should follow some kind of basic
> language logic.
>
> If I understand HTML correctly, it provides a limited set of pre-qualified
> elements, some of them with a more structural emphasis, some of them with a
> more semantic (or or even presentational) one. The composition of that set
> does not follow a higher logic, but the everyday needs of the common web
> author (or what the writers of the spec assume this is).
>
> (I hope this is understandable; I am not a native English speaker, either.)
>
> So, supposed I got these both correctly, you do not really talk about HTML,
> but about an alternative approach of marking up text documents. I
> personnally find thinking about alternative approaches very interesting and
> useful for opening up one's mind.
Actually, I think that what Pentasis is talking about is nothing else
than HTML in its earliest and purest form, untainted by the side
effects of the browser wars and the mistakes of the past. Although we
can't undo past mistakes, we can learn from them; and put some effort
on fixing them.
Initially, HTML was entirely structural: no presentation, and no
semantics. Just paragraphs, headings, anchors, and few other things.
With HTML3.2, there was an atempt to make HTML presentational, and it
soundly failed. It was aknowledged as a mistake, and HTML4 (plus CSS)
put a good deal of work on fixing it: presentational stuff went out
(more preciselly, "deprecated"), and presentation was delegated to a
separate language (CSS). HTML only left @class for hooking to external
information, and @style for when embedding was more appropriate. Then,
to make sure noone was left out, a Strict flavor of the language was
published, keeping it "pure", and a Transitional one, keeping all the
deprecated stuff on it to ease transition, and to enable
document-level backwards compatibility. I hope we all agree this was a
good solution and that it worked; but if somebody doesn't, please let
me know. (It's true that shortly after came XHTML1, adding quite a bit
of confussion to the scene, but that's a separate topic).
So, if it worked, why not reuse that approach? Why do we need to go
through the same mistakes again? Ok, that's an easy one: we need
'cause we are human :P. Jokes aside; am I really the only one here
that sees this as exactly the same thing!?
Let me try to make it even clearer: after the 3.2 disaster, it was
found that: (1) presentational markup didn't enough to properly
control the presentation of webpages; and (2) presentational markup
clashed so often with structural markup that markup itself was not
reliable anymore to infer the structure of a document: either
structure was sacrified in favor of presentation, or presentation was
sacrified for structure.
Now, Pentasis initial posts were showing up a fact: sematic markup
doesn't do enough to properly describe the semantics of webpages. I
had already posted some comments and even a few examples showing how
semantics and structure can often clash, requiring one to be tweaked
to achieve the other. Doesn't sound familiar? Can't we simply apply an
equivalent solution to the one we used for an equivalent problem ten
years ago?
Before going on, here are some of the simplest examples I posted a
couple of months ago about this issue:
<nav> is the only facility in the spec right now to describe
"navigation" semantics; but it also implies a "section" structure:
hence there is no means to express "navigation" semantics for
something that isn't structurally a "section" (for example, headings
of the recent changes to a site in the site's main page, linked to the
relevant sections, are quite "navigation" stuff, but they are
definitely not sections).
Similarly, there is no way to mark something as "tangentially related"
without making it a "section" (with the <aside> element).
And, for example, what about something that's both "navigation" and
"tangentially related" (regardless of wether it is a section or not)?
For example, a list of "see also" stuff on a documentation page: you
would be forced to markup it as <<a "navigation" section inside an
"aside" section>> or as <<an "aside" section inside a "navigation"
section>>: none of both reflects the real structure of the page; but
they are the only ways to represent both semantics. I know these
examples are really simple, and the workarounds wouldn't really hurt
that much; but they should be enough to show how we are stepping into
the same issues with semantics that we did over a decade ago with
presentation. Do we really have to wait to be hurt by the issue before
solving it, when we can see it so clearly approaching? I don't know
you, but I know I am *not* masochist, so I don't really want to get
hurt.
Now, to something more specific, we'd need:
1) Some (external to HTML) way to describe semantics. (And no, I don't
think RDF, on its current form, is a solution for this; but maybe the
solution could be based on or inspired by RDF.) That should be to
semantics what CSS is to presentation. And we don't really need to
care about browsers quickly implementing it, or about legacy browsers
that don't implement it, because currently browsers don't care at all
about semantics (at least, not beyond displaying @title values and for
default rendering, and rendering can be dealt with through CSS
anyway).
2) A way to hook these external semantics to arbitrary elements of a
page: we already got @class for this :D
3) A way to add inline semantics when needed. I guess a "semantics"
attribute would be the most straight-forward approach. About the
format it uses, we should care about it once we have solved 1).
If we got that, then we could:
1) Get rid of all the "wannabe semantic" elements that didn't really
work well enough, sending them to the
deprecated/transitional/supported-for-backwards-compatibility-only
limbo.
2) Get rid of all the *new* "wannabe semantic" elements that wouldn't
be really serving any purpose (ie: un-bloat the content model)
3) Have the simplest and cleanest markup, the most accurate
presentation mechanisms, and the richest semantic descriptions of the
last 10 (or even more) years, all in one package.

> I agree with you that there are many things in HTML that have a purely
> historic legitimation, such as the h1-h6 elements. <h level="n"> would be
> much more flexible. I personnally often get mad about the IMO totally
> unlogic set of form elements. I would highly appreciate such thigs to be
> cleaned up in a new HTML spec. But of course the task ot those who design
> HTML5 is not to re-invent the wheel, but to evolve the existing HTML in a
> highly backwards-compatible way.
I have already mentioned what do I think about the
backwards-compatibility requirement, and the way it's being
approached.

Anyway, I think its also worth pointing out the issue with headings:
currently, the spec recommends using <h1> for all levels of headings,
but that would mess the hell up on current browsers. Hasn't anybody
noticed that?

> I made the experience when I suggested a new set of form elements, that I
> did not get much response on those contributions. The same might happen to
> your suggestions, as they are on a more basic level, than the HTML5 works
> act on. I don't think you can blame the people working on HTML5 for this, as
> they are quite far in the process, and your suggestions do rather set new
> starting points, than contribute to the acutal state of the work.
These are quite different cases: the main issue with form elements is
that their functionality is normally hardcoded in the browser.
Pentasis suggestions (and even my own) would only significantly affect
the spec itself and validators; and maybe future "smart browsing"
features that aren't yet implemented anyway.

Well, that's been a long enough message, and over 3 hours of typing
and reviewing stuff are now asking me for a cigarrete, so I'll post
again soon with the "additional comments" I was planning to add.
I want to remind you all that this message mostly reflects my point of
view; and if someone disagrees I'm more than willing to pay attention
to your arguments. Also, I think it'd be good to start branching stuff
from here rather than keeping the multi-discussion on this thread.

Regards,
Eduard Pascual