[whatwg] Thoughts on HTML 5
bhawkeslewis at googlemail.com
Thu Dec 18 07:47:23 PST 2008
Giovanni Campagna wrote:
> 2008/12/17 Ian Hickson <ian at hixie.ch>
> This doesn't cost any time in HTML either, since the tokeniser doesn't
> need to worry about what tags have end tags, the tree construction side
> just drops unexpected end tags on the floor.
> I don't think authors expect tags to disappear.
Perhaps (got any actual evidence about author expectations in this
case?), but that's not a problem for tokenizer performance. You're
"shifting the goalposts".
Anyway, if we're talking authorial expectations, ordinary authors don't
to be an unrecoverable error, but it is in XHTML.
It's not like either of these syntaxes make sense to ordinary people or
were even intended to do so. The original authoring model for HTML was
supposed to be "paragraph" and "anchor", mediated by some sort of
vaguely WYSIWYG type editor, not angle-bracketed tags.
> > don't check for insertion modes
> Having an insertion mode isn't particularly a performance cost. (It
> affects code footprint, but that's about it.)
> 1) it needs more code (one x insertion mode): more code is always less
> performance, even if it is just to load a bigger executable
> 2) it needs code to select the insertion mode for the next element
> (when the spec says to reset the insertion mode): in the worst case it
> has to compare nodeName 18 times
> > That's the same as HTML.
> No it is not. HTML defines special beaviour for the following elements:
> address, area, article, aside, base, basefont, bgsound, blockquote,
> body, br, center, col, colgroup, command, datagrid, dd, details, dialog,
> dir, div, dl, dt, embed, eventsource fieldset, figure, footer, form,
> frame, frameset, h1, h2, h3, h4, h5, h6, head, header, hr, iframe, img,
> input, isindex, li, link, listing, menu, meta, nav, noembed, noframes,
> noscript, ol, p, param, plaintext, pre, script, section, select, spacer,
> style, tbody, textarea, tfoot, thead, title, tr, ul, and wbr.
> I think they're quite too many to say that it is like XML
> > There are a number of HTML5 parser implementations, and data suggests
> > there is no particular performance gain.
> There are no actual HTML5 parser implementation, only HTML4 compatible
> with new syntax.
Ahem, there are several:
> > There's no guessing in HTML either; all input streams have very specific
> > and required results.
> Actually, there's nothing that really says that <div><p>some
> text</p><p>some more text</p></div> is more correct than <div><p>some
> text<p>some more text</p></p></div>
> Just when writing the specification you guess that the first possibility
> is what auctor thought. You are guessing, not the browser.
A conforming browser will interpret the markup as specified by the
specification, so there is no difference.
> Every input, even from the most
> trustworthy source, must be parsed for errors and then checked after
In practice, people find this very hard for XML and most web publishing
systems (WordPress etc.) don't work like this even if they should.
Also, much of the web is ad-supported. The ads ecosystem is based around
including markup from trusted sources. Those including the markup are
generally not able to exert much control over the included markup, even
when they are some of the biggest publishers on the web. Getting ads to
have user-friendly HTML (e.g. alt attributes for image links) is nigh
impossible; trying to get conforming HTML is a wet dream; and trying to
get ads in valid XML is a likely to be a complete non-starter. Why would
an ad creator bother, when they could choose a different partner and use
their old text/html ads?
> And if an end user finds an error, he probably will report it to the
> owner of the web site, who in turn will report it (quite angrily) to web
> designer. Something like: "What on earth are you doing in front of the
> coffe machine? I don't pay you to rest! Fix that website immediately!
"Probably" - got any empirical evidence for that? I don't usually report
errors in websites I visit (even _I_ usually have other things to do
with my time).
In any case, avoiding angry customers complaining because XML threw a
fatal error that would have been handled gracefully in HTML is an
infinitely stronger incentive for developers to keep using text/html
than anything the spec might say on the matter, so this isn't a
persuasive argument for switching to application/xhtml+xml.
> > Well, they've ignored it for the past 7 years, so why would they change?
> Nobody said to user that he was browsing a deprecate web site. If
> something like IE7 information bar (ie. a non modal bar, disactivable
> and not annoying the user, but immediately visible) could appear in a
> web site sent with text/html, I think companies won't like their site
> tagged as "deprecate" and port them to application/xhtml+xml in no time
> (do you imagine what "deprecate" can mean on news web site?)
Indeed, they would be upset. And they might even try porting it.
However, there's little incentive for browser makers to throw
information bars over the majority of the existing web just to assuage
your desire for people to switch to XML.
In fact, there are disincentives for browser vendors to include such an
information bar since:
1. Users will complain about error messages about sites that have always
worked just fine. ("I'm switching back to IE8.")
2. Users will be trained to ignore error messages since sites work just
fine even with a finger-wagging information bar slapped across the top,
which is a security risk.
Even persuading browser vendors to include an indication of whether a
website is valid or not has been a non-starter for every browser except
iCab - and even iCab has dropped that indication in the latest version.
> > Anyway, it isn't clear that we would _want_ to deprecate HTML, even if we
> > had any real choice in the matter.
> I'm not sure if I understood your sentence (sorry, English is not my
> mother language). Anyway, you just have to put an "authoring
> requirement" for text/html
Ian was effectively asking: "Why deprecate text/html?" You appear to be
trying to answer: "How would we deprecate text/html?" which is a
different question (and I've indicated some problems with your
> Gradually, n° 3 will disappear, because there's no actual needing for HTML.
Except on the ad-supported web…
More information about the whatwg