[whatwg] several messages about HTML5
ian at hixie.ch
Tue Feb 20 16:05:47 PST 2007
On Tue, 20 Feb 2007, Vlad Alexander (xhtml.com) wrote:
> 4. One of the biggest problems with HTML is that content authors can get
> away with writing "tag soup".
Is it really a problem? Or is it the reason the Web is so wildly
successful? Would the Web have taken off in the same way if it worked like
most other systems, showing error messages whenever something was the
least bit wrong?
> As a result, most content authors don't feel the need to write markup to
> specification. When markup is not written to specification, CSS may not
> may not be able to process content as the author intended.
Having draconian error handling -- the term we use for just not allowing
errors instead of having silent error recovery like HTML does -- is not
the only solution for getting consistent behaviour between browsers. The
approach that we have taken with HTML5 is to define what _any_ document
means, even if it is invalid -- down to the last detail, so that every
browser will handle every document in an equivalent way, whether the
document is conformant or not. (It's the same technique CSS uses.)
> Why not put an end to "tag soup" by requiring user-agents to only accept
> markup written to specification?
There are literally dozens if not hundreds of billions of documents
already on the Web. A study of a sample of several billion of those
documents with a test implementation of the HTML5 Parser specification
that I did at Google put a very conservative estimate of the fraction of
those pages with markup errors at more than 78%. When I tweaked it a bit
to look at a few more errors, the number was 93%. And those are only core
syntax errors -- it didn't count misuse of HTML, like putting a <p>
element inside an <ol> element.
If we required browsers to refuse those documents, then you couldn't
browse over 90% of the Web.
But consider -- if one browser showed error messages on half the Web, and
another browser showed no errors and instead showed the Web roughly as the
author intended. Which browser would the average person use?
If we want to make HTML5 successful, we have to make sure the browser
vendors pay attention to it. Any requirements that make their market share
go down relative to browsers who aren't following the spec will
immediately be ignored.
> 5. X/HTML 5 has a construct for adding additional semantics to existing
> elements using predefined class names. Predefined class names could be
> the most controversial part of X/HTML 5, because the implementation
> overloads the class attribute. XHTML 2 provides similar functionality
> using the role attribute. Which approach is better and why?
The proposal to have predefined class names is still very much in the air,
we're mostly waiting for author and implementation feedback to see if it
is workable. Currently the HTML5 spec leaves a number of things unanswered
(like what happens if two classes on an element are contradictory), so
it's definitely not finished.
I haven't been able to work out what the role="" attribute does or how it
is supposed to be implemented. It has a spec, but that spec is really
unclear. So I can't really comment on it.
> 6. The font element is a terrible construct, primarily because content
> creators using authoring tools use the font element instead of semantic
> markup. The X/HTML 5 spec supports the font element when content is
> authored using WYSIWYG editors. What is the rationale for this? Why
> would WYSIWYG editors get an exemption? And is this exemption going to
> make the Web less accessible?
Again, the whole <font> element and WYSIWYG section is up in the air, the
current text is just a strawman and we've received lots of good feedback
on it which will need to be taken into consideration.
The main reason that WYSIWYG editors would be given an exception is that
the state of the art in user interface today doesn't have a good solution
for making a semantic editor usable by the average person. We could
require editors to do this, but since nobody knows how to do it, it would
be a stupid requirement. Again, we have to compromise on perfection to
actually address real-world needs and constraints.
> 7. The XHTML 5 spec says that "generally speaking, authors are
> discouraged from trying to use XML on the Web". Why write an XML spec
> like XHTML 5 and then discourage authors from using it? Why not just
> drop support for XML (XHTML 5)?
Some people are going to use XML with HTML5 whatever we do. It's a simple
thing to do -- XML is a metalanguage for describing tree structures, HTML5
is a tree structure, it's obvious that XML can be used to describe HTML5.
The problem is that if we don't specify it, then everyone who thinks it is
obvious and goes ahead and does it will do it in a slightly different way,
and we'll have an interoperability nightmare. So instead we bite the
bullet and define how it must work if people do it.
> 8. The chair of the HTML Working Group at W3C, Steven Pemberton, said
> "HTML is a mess!" and "rather than being designed, HTML just grew, by
> different people just adding stuff to it".
He's right! This continues to this day. We're trying to bring some level
of sanity to the process, though!
> Since HTML is poorly designed, why is it worth preserving? Or is HTML
> fixable? If so, how does X/HTML 5 fix it?
The original reason I got involved in this work is that I realised that
the human race has written literally billions of electronic documents, but
without ever actually saying how they should be processed. If, in a
thousand years, someone found a trove of HTML documents and decided they
would right an HTML browser to view them, they couldn't do it! Even with
the existing HTML specs -- HTML4, SGML, DOM2 HTML, etc -- a perfect
implementation couldn't render the vast majority of documents as they were
Every major Web browser vendor today spends at least half the resources
allocated to developing the browser on reverse-engineering the _other_
browsers to work out how things should work. For example, if you have:
<p> <b> Hello <i> Crurel </b> World! </i> </p>
are and what their parents are, what should happen? It turns out that,
before HTML5, there were no specs that defined this!
I decided that for the sake of our future generations we should document
exactly how to process today's documents, so that when they look back,
they can still reimplement HTML browsers and get our data back, even if
they no longer have access to Microsoft Internet Explorer's source code.
(Since IE is proprietary, it is unlikely that its source code will survive
far into the future. We may have more luck with some of the other
browsers, whose sources are more or less freely available.)
Once I got into actually documenting HTML for the future, I came to see
that the effort could also have more immediate benefits, for example
today's browser vendors would be able to use one spec instead of reverse
engineering each other; and we could add new features for authors.
> 9. Supporters of X/HTML 5 call XHTML 2 radical. History has shown us
> that radical technological change is often controversial, but in the end
> is the best choice. For example, in the last 40 years, the technology
> for delivering music has change radically, from vinyl, to cassette, to
> CD, to purely digital. Why should the Web shy away from a radical
> technological change?
If we look at music we see several key factors. The move from Vinyl to
Cassette came with radical new benefits like the shock-resistance and the
read-write nature of the media. Also, notice how cassette tape didn't
replace vinyl; they co-existed with different audiences for a long time.
The introduction of digital optical media (CDs) also introduced radical
new benefits: significantly higher quality and loss-less reproduction. CDs
replaced Vinyl, because they did the same thing, but radically better.
However, tapes continued to be used for years, since CD-RW was not widely
available straight away. We're now seeing a move from CD collections to
audio stored on digital magnetic media (like iPods), but if you look
carefully you'll see that a very clear migration paths exists from Audio
CDs and tapes to portable media players: CDs can be ripped and stored on
the new media, and the new media can play to tape adapters that plug into
car stereos like old cassettes.
So we see that successful radical change requires two key features:
* Radical new benefits to offset the pain of change
* Backwards-compatibility with the old technology
I'm sure there will come a day where new technologies exist which do
indeed shift the Web radically to new worlds, but I don't think we've seen
In a way, though, HTML5 itself is a radical change. Not for what it
specifies, but for the way it's been created. It's the first really open,
collaborative community that has taken on a task of this magnitude
(nothing less than rewriting the book on the core Web language) --
hopefully, future technologies will also follow this model, instead of the
more traditional behind-closed-doors, corporate-sponsored approach that
most other technologies have used.
> 10. In the minds of most people, HTML is dead and X/HTML 5 is perceived
> as an attempt to resurrect it. Given this perception, how can you
> succeed in marketing HTML to consumers (those who build Web sites)?
According to some other research we did for the HTML5 effort, over 95% of
the Web today is HTML, with the rest being mostly a smattering of PDF,
Word, and plain text. Under what definition of the word could it be
considered "dead"? Web designers never stopped writing HTML pages. I don't
think we'll have any difficulty convincing them to continue doing so.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg