[whatwg] Various HTML element feedback
ian at hixie.ch
Tue Jun 5 16:53:03 PDT 2012
On Sat, 21 Jan 2012, Jukka K. Korpela wrote:
> 2012-01-21 0:30, Ian Hickson wrote:
> > On Tue, 26 Jul 2011, Jukka K. Korpela wrote:
> > >
> > > I donât think you have clarified whether <var> is suitable for
> > > physical quantities, but I guess you meant to imply itâeven though
> > > there is not a single example about markup for physical quantities.
> > Given that the spec contains the exact example you gave (E=mc^2), and
> > given that the definition explicitly includes "an identifier
> > representing a constant" as one of the uses for the element, I have to
> > disagree with your assessment.
> Now that you have added that example, the text implies that <var> is the
> suggested markup for symbols of physical quantities. It is still
> somewhat odd that this is expressed via an example only, and the basic
> prose says: âThe var element represents a variable. This could be an
> actual variable in a mathematical expression or programming context, an
> identifier representing a constant, a function parameter, or just be a
> term used as a placeholder in prose.â None of the examples covers
> symbols of physical quantities, and yet they are probably more common
> texts in general (as opposite to mathematics and programming) than the
> examples given.
I don't really understand why you don't think the text you quote doesn't
cover symbols of physical quantities (also known as "variables" or
"constants" depending on the specific symbol in question), but in the
interests of moving on, I've made the spec redundantly unambiguous on this
front by listing "a symbol identifying a physical quantity" explicitly.
> > > On the other hand, it seems that it doesnât really matter. The
> > > <var> element has now been defined to have such a wide and vague
> > > meaning that it is pointless to use it. There is little reason to
> > > expect that any software will ever pay attention to <var> markup on
> > > any semantic basis.
> > You seem to imply that there was reason to expect so before, which is
> > certainly news to me!
> I have rather been optimistic about future developments for markup
> elements that have been defined exactly enough to warrant meaningful
> semantics-based processing. For example, most of the uses mentioned in
> current text imply that <var> element contents should be kept intact in
> automatic language translation.
That continues to be the case, so I don't know why you conclude that using
it is now pointless.
> > I would not really expect these elements to be used for anything other
> > than styling hooks.
> That might be realistic, especially there is no significant semantic
> clarification in sight in general. This raises the question why we could
> not just return to the original design with some physical markup like
> <i>, <b>, and <u> together with <span> that was added later.
I think you'll find the "original design" of HTML isn't what you think it
is (or at least, it's certainly not as presentational as you imply above),
but that's neither here nor there. The reasons for eschewing
presentational markup in favour of more semantic/structural markup are
> Whatâs the idea of wasting time in wondering which markup to choose,
> among several vaguely described alternatives, when it all ends up with
> being comparable to arbitrary author-named styles in word processing?
I would point you to this article:
...but I think you probably already know of it.
> The advantage of using <i>, <b>, and <u> is that they have defined
> default rendering (even in the absense of CSS) and universal support in
That is _an_ advantage, yes. Not the only one.
> > > So authors will use <i> if they think italics is semantically
> > > essential, and <var> wonât be used much.
> > That seems to be the status quo.
> So why not simply define <i> recommended and describe <var>, <cite>,
> <em>, and <dfn> as deprecated but supported alternatives?
What benefit does empty deprecation have? It's not like we can ever remove
these elements altogether. What harm do they cause?
If we have to keep them, we are better served by embracing them and giving
them renewed purpose and vigour, rather than being ashamed of them.
> This would make authoring simpler without any real cost. Thereâs
> little reason to tell authors to use âsemantic markupâ if we donât
> think it has real effect on anything.
It does have an effect. It has many effects. It makes maintenance easier,
it makes it easier to transition from project to project, it makes it
easier to work on other people's markup, it makes it significantly easier
to dramatically change a site's appearance, it makes it easier to create
apply custom tools to extract information from the documents, it makes it
easier for search engines to guess at author intent, it makes it easier
for the documents to be repurposed for other media, it makes it easier for
be used and mixed...
> > However, some authors like the ease of maintenance that comes from
> > using elements as a general classification mechanism and classes to
> > provide fine-grained control, and it is mostly for them that HTML
> > provides a variety of more specific elements like <var>.
> This implies a burden on learning, teaching, and using HTML. Anyone who
> seriously tries to understand HTML will ask, for example, which of
> <var>, <cite>, <em>, <dfn>, <i>, <span>, <abbr> he should use in
> particular situations.
Indeed, which is why the HTML specification has been written with this
audience in mind (amongst others). http://developers.whatwg.org/ in
particular presents the answers to those questions in various different
ways for author convenience, and I encourage people to write further
documents that build on these.
Honestly, the question of which element to use when is one of the least
complicated questions new Web authors will face, despite our best efforts
at keeping the Web simple to write for. Just look at, e.g., the new APIs
> > > > > Too bad there's no example of<var> used in programming context.
> > > > > The current wording suggests that it would be normal, when
> > > > > discussing programming, to write, say, "Then we define the
> > > > > variable <var>myFoo</var> of type <code>fooType</code> with
> > > > > initial value <code>"Foo"</code> - -", which really makes no
> > > > > sense, even if we use both <var> and <code> for myFoo.
> > > >
> > > > Why does it make no sense?
> Because <var> does not imply that the contents is computer code. Yet a
> variable name in programming is surely code if a type name or a literal
> is. And using <code><var>myFoo</var></code> is clumsy, and it makes the
> text appear in italics by default, which is probably unsuitable
> (monospace italics doesnât work well). Why would an author use markup
> that by default causes rendering that he does *not* want, when thereâs
> the option of using <span>?
The option is there if the author wants it. No author is forced to use any
markup at all, just like we use none in this e-mail exchange.
> > > Because it implies that in default rendering, identifiers of
> > > variables appear in italics whereas identifiers of types or classes
> > > do not. Why would anyone use extra <var> markup when it has no other
> > > implications than requiring extra CSS code to remove (when possible)
> > > italics?
> > To enable easier maintenance of the markup and easy self-documenting
> > styling, same as pretty much all of HTML.
> I donât see how <code><var>myFoo</var></code> would ease maintenance
> and be selfâdocumenting, as opposite to e.g. <span class="code
The benefit of <code> and <var> over <span class="code var"> is that they
are well-known names that future authors don't have to learn when they
start working on your old project after you've moved on. The more
knowledge like this that we can bake into the language, the easier
transitioning between projects can become. Naturally there's a point of
diminishing returns where the cost of knowing all the elements is higher
than the benefit from having them; this is a balancing act that we must
perform as part of designing the language.
<code> and <var> ease maintenance and are more self-documenting than
solutions such as <span class="style1"> or <tt>, which is what I assumed
you were comparing this case to.
On Tue, 24 Jan 2012, Jukka K. Korpela wrote:
> 2012-01-24 1:18, Ian Hickson wrote:
> > <u>, for instance, was only added after rather compelling use cases
> > were presented.
> The only use cases mentioned in the current version of "the living
> standard" are "labeling the text as being a proper name in Chinese text
> (a Chinese proper name mark)" and "labeling the text as being misspelt".
> These are semantically so remote that using the same element for them is
> artificial, to put it mildly.
They are both cases of unarticulated, though explicitly rendered,
> What are the actual benefits of using <u> instead of <span>?
I presume you mean what is the benefit of:
...for some arbitrary value of "x". The benefits are as discussed above --
easier maintenance, easier for people to transition between projects, etc
-- plus the extra benefit in this case of being far shorter. :-) It also
leads to slightly cleaner-looking CSS. And there's the advantage you
> The only difference is that with <u>, the default rendering on common
> browsers will use underlining.
This isn't the only advantage, but it is one, yes.
> This is the true meaning of <u>, and abstract, vague "semantics" will
> not help authors but confuse them.
I disagree, for the reasons on which I have expounded above.
> What is _compelling_ about markup for proper names in Chinese? HTML has
> had no markup for proper names in any language. Why introduce markup for
> them in one language, with the assumption that a specific rendering
> convention, now apparently rare, will be used?
This particular semantic is actually relatively common in that locale. For
further discussion on this I recommend seeing earlier threads on this list
or the Wikipedia page on the Chinese proper name mark.
> What is _compelling_ about markup for misspellings?
It's a feature that is necessary in text editors, for which we previously
did not have a good solution.
> How many web pages use such markup and need it, and why is it compelling
> that <u> be available to them?
WYSIWYG text editors are becoming more and more common on the Web. For
example, many blogs, social networks, CMSes, wikis, etc, have them.
> What is so _semantic_ about it if can mean Chinese proper name _or_
> misspelled word?
It means "unarticulated, though explicitly rendered, non-textual
annotations". Those are just examples of such annotations.
> > > > By reusing existing elements, we are able to support them without
> > > > having to wait for new elements to be implemented.
> > >
> > > Several new elements have been added without such concerns.
> > Again, you are incorrect. The concerns were very much present.
> There was, for example, no support to <mark>. Maybe there is now, but I
> doubt. Why wasn't an existing element, like <font>, wasn't used for it?
We considered that (indeed, we tried various element names for what
eventually become <mark>, including at one point <u>, IIRC). In the end we
settled on <mark> because reusing at existing element did not provide a
compelling advantage in the same way as it has for some of the other
elements, e.g. <small>. Specifically, none of the existing elements had
the default rendering we wanted, and no significant fraction of the sites
that were working around the absence of <mark> used any particular element
in a consistent way to encode that semantic.
> Which "support" was needed? Right, underlining. So what's so difficult
> in saying that <u> is just as semantic as <span>, except that <u> is by
> default underlined?
It's not difficult at all, it's just not as useful, and has disadvantages
that are worth avoiding (e.g. suggesting that HTML is media-dependent).
There are other more subtle examples of element reuse in the contemporary
HTML specification. For example, we now have XHTML2's <section>/<h>
concept in HTML. However, instead of <h> we reused <h1>, because it had
the right existing handling and default presentation. By careful
construction of the outline algorithm, we were able to actually make this
<h> element and the HTML4 <h1> element work in a completely compatible
way, while leveraging existing implementation support.
> > With <u>, many of the actual uses of the element can be seen as uses
> > of both the old presentational meaning and the new media-independent
> > meaning without conflict.
> That's because "the new media-independent meaning" has been formulated
> so vaguely that it can be ignored and the presentational meaning
> understood as the real one. But people who will try to take the text for
> real will get hopelessly confused (until someone comes to the rescue
> saying "oh, <u> _really_ means underline").
I see no reason to believe that will occur to any significant degree.
> > I would no more think we need an element for "bolder" than I would
> > think we need an element for "louder" in speech synthesis or an
> > element for "bigger hand gestures" in sign-language interpretation
> > (not that I'm aware of a sign-language HTML UA, but there's no
> > fundamental reason one couldn't exist in the future). When you start
> > from the fundamental position that these media are no more important
> > than each other, it is really hard to see why we would ever introduce
> > "phrase-level typographic features".
> It's not that hard if you think that HTML is all about markup for
> written languages.
But it isn't.
On Wed, 25 Jan 2012, Bjartur Thorlacius wrote:
> On Wed, 25 Jan 2012 22:26:31 -0000, Ian Hickson <ian at hixie.ch> wrote:
> > Actually, they are remarkably similar. I think it's anachronistic to
> > consider that the utterances of the site owner are in some way
> > distinct from the utterances of the site readers.
> While I do agree with you (for a change), identifying authors of
> <article>s is undeniably useful. First-posters in a thread or
> syndication may be styled differently from other posters, <articles> by
> authors of outer <articles> may be emphasized in some way, <articles> by
> certain authors may be omit, etc. I find no algorithm for identifying
> authors of <article>s in the spec. Should the value of the first href
> attribute in a footer be assumed to identify the author? That seems
> bound to break, early and often.
It's the value of the href="" attribute of the links in the <article>
element that have the rel=author keyword.
On Sat, 4 Feb 2012, Marat Tanalin | tanalin.com wrote:
> AFAIK, the limitation "list items must be direct children of list" has
> been invented long before common containers (DIV/SPAN) has been
> invented. So, while it was reasonable initially to disallow alien
> _structural_ children of lists (for example, H2 as direct child of UL
> would be semantically pointless indeed), it's currently unreasonable to
> disallow common containers as nonstructural children of lists.
The problem is that parsing in lists is highly unintuitive. For example:
Now add <div>s:
What do you think the DOM looks like?
Kiwi, Pear, and Orange are siblings.
On Sat, 4 Feb 2012, Ambrose LI wrote:
> I dont really think thats fixable. We need a new element that is truly
> structural to begin with that can functionally represent real
> paragraphs. BLOCKQUOTE is broken in the same way, but at least that can
> theoretically be fixed because its end tag is mandatory.
Note that as it is specified, <div> can be used instead of <p> with
basically no loss of semantics. (This is because the spec defines
"paragraph" in a way that doesn't depend on <p>.)
[ Some of the following e-mails were cross-posted to WHATWG and www-style;
as a general rule please avoid cross-posting as it results in splintered
threads when people who aren't on both lists reply on the thread. ]
On Sat, 4 Feb 2012, fantasai wrote:
> On 02/03/2012 12:22 PM, Ian Hickson wrote:
> > On Tue, 10 Jan 2012, Hugh Guiney wrote:
> > >
> > > As I understand it, the main reason for rejecting<di> was that it
> > > solves a problem that is allegedly CSS's job, but as an author who
> > > uses <dl>s quite extensively, adding a grouping element would really
> > > make my life a lot easier.
> > There are a number of places in HTML where it would be nice to be able
> > to group things together -- just look at how often people stick<div>s
> > in their pages for no purpose whatsoever other than styling.
> > This shouldn't be necessary. It's a limitation of CSS.
> > The right solution is for CSS to provide some pseudo-element or other
> > mechanism that introduces an anonymous container into the rendering
> > tree that wraps the elements you want to wrap.
> I don't think this is a CSS problem. I think it's an HTML problem. It's
> not just that you might want to style definition items, you might also
> want to tag them with an ID so you can use them as a target anchor.
You can do that by just putting the ID on the first one.
> Or pick them up and do interesting things with them via script.
Scripts have a concept of DocumentFragment and concept of DOM Ranges that
make this kind of manipulation possible.
> But you can't do any of these three things because you can't wrap them
> in an element.
You do not need an element to do these things.
> Pseudo-elements are a non-trivial thing to spec, and a non-trivial thing
> to implement, and a comparatively confusing thing to use. Yet you're
> suggesting that we use them to solve a problem that is not even entirely
> solved by a new pseudo-element, rather than defining an appropriate HTML
> element for the job. So what is it about defining a real element that is
> so problematic that we're considering a pseudo- element here?
It's not a markup problem. If pseudo-elements are too complicated, then
another solution should be found. There's no reason CSS has to be complicated.
On Mon, 5 Mar 2012, Hugh Guiney wrote:
> On Mon, Mar 5, 2012 at 12:09 AM, Ian Hickson <ian at hixie.ch> wrote:
> > The only thing it adds to the grouping is the ability to have a
> > subsection that is then followed by more content from the subsection's
> > parent section. You couldn't do that with <hx> alone. However, for
> > <section> that's more of a negative than a positive, really (it makes
> > more sense for <aside>, <nav>, and <article>; <section> only allows it
> > for consistency).
> In what ways is that a negative?
It's highly confusing for a section heading to _not_ apply to text that
follows it without another heading coming in between. You never see it in
print, because it would be so confusing.
> > The spec doesn't generally include design rationale. (If anyone would
> > like to help maintain our rationale documentation, please let me know.
> > We're always in need of volunteers there.)
> What type of work is involved?
Basically combing the mailing lists and quizzing people on IRC in order to
write "behind the scenes"-style documentation describing how the spec
ended up being the way it is.
> > <di> doesn't exist. The ability to have multiple types of authoring
> > style isn't the reason for <section>'s existence. It's just a
> > side-effect of now having two different ways to mark up sections. It's
> > not actually a good thing in language design to have multiple ways to
> > do something (despite what Perl might have us believe!).
> HTML is full of multiple ways to do things: a run of text can avoid <p>
> and be the child of a <div> if the author prefers
Yes, this is an unfortunate development from the HTML 4.x days, IMHO. I
don't think it's a good thing.
> a <footer> can be at the bottom or top of a section
That's two different things, I don't see how that is an example of
multiple ways of doing the same thing.
> authors can continue to use Microformats despite the existence of
> microdata, etc.
I won't deny that the situation with embedded data annotation syntaxes is
somewhat of a mess, though IMHO that's primarily because when microdata
was written, RDFa and microformats did not sufficiently and effectively
address the given use cases and there was no obvious way to evolve them
that would solve the problems.
> If language idealism was a tenable goal on the Web then the WHATWG
> wouldn't exist and we would all be using XHTML 2 right now (which had
> <di>, for that matter).
Language idealism is indeed a goal. It's just not the only goal, and
sometimes, many times, other things take priority.
> > <section> wasn't introduced as a stop-gap measure.
> > There's no such thing as a stop-gap measure on the Web. We can't add
> > something then remove it. Once we've added something, it's part of the
> > platform, forever. That's why we have to be careful to only add things
> > that make sense on the long term.
> I only said "stopgap" because you posited CSS grouping as the ideal we
> should be striving for, when this method would work today. I actually
> don't think this should be taken out at a later time, as CSS grouping
> only addressing the issue of styling. It does not address the fact
> thatas I outlined in my original postit is impossible with <dl>'s
> current parsing model to specify a named value followed by an unnamed
> value, since the unnamed value would be subsumed into the preceding
> group and be interpreted as an alternate value for it.
Sure you can:
<dt><!-- no name :-( -->
Generally though I wouldn't recommend such markup. It's highly confusing
On Mon, 5 Mar 2012, Gray Zhang wrote:
> <dl> (and <di> in this discussion) refers to **definition**, but in most
> cases above, it should be a key-value pair rather than a
> term-description pair, the former is more general in semantic, so I
> think we should have a general purpose key-value pair element such as
> <pair>, <pn> (pair name) and <pc> (pair content), in such case a user
> info form could be:
> <pc>John Smith</pc>
> This is more symantically structural correct, <pn> should be a phrasing
> element while <pc> is a flow element so you can put flow/phrasing
> content in <pc>. Although styling <pair> to aligned
> vertically/horizontally is a matter of CSS but HTML gives a more
> definitive tag name
We actually do have a mechanism more like what you describe:
...for instance, with a <table> around the whole thing. It allows pretty
much arbirarily complicated two-dimensional data to be marked up.
<dl> itself is no longer defined in terms of "definitions", by the way.
It's definition is "an association list consisting of zero or more
On Wed, 25 Apr 2012, Andrés Sanhueza wrote:
> Currently, a <footer> tag is not allowed to appear inside a <header> tag
> and vice-versa. That does makes sense when sticking to what the names of
> the elements imply, yet conceptually I see no reason a <footer> as in
> "textual metadata of a section" can't be inside a <header> ("lead of a
> section"). Could this be considered to be allowed?
On Thu, 26 Apr 2012, Benjamin Hawkes-Lewis wrote:
> Do you have a real example where you think that markup would be useful?
> If user agents provide commands to navigate to headers and footers,
> nesting them could make navigation confusing.
On Thu, 26 Apr 2012, Tab Atkins Jr. wrote:
> One was presented in another thread - according to the definition of
> <footer>, it appears that authorship information is most appropriate to
> put there. But sometimes the byline is placed inside the "header" area,
> which is reasonably marked up with a <header>. So, it makes sense to be
> able to nest the <footer> within the <header>.
On Fri, 27 Apr 2012, Benjamin Hawkes-Lewis wrote:
> Isn't that use case addressed by <address>?
On Thu, 26 Apr 2012, Andrés Sanhueza wrote:
> No. <address> is much narrower and indicated in the spec as such.
> Bylines can also contain the date or, in blog post, links to tags.
On Wed, 2 May 2012, Benjamin Hawkes-Lewis wrote:
> Good point.
> Can you provide an example where you'd to put a <footer> *inside* a
> <header> rather than after it like so:
> <p><time datetime="2012-04-30">30 April 2012</time></p>
> <p><address>John Doe</address></p>
> <li><a href="/tags/politics">politics</a>
> <li><a href="/tags/politics">environment</a>
> <p>Article bodyâ¦</p>
> It's worth noting that the definition of <header> is broad enough to
> allow byline, date, and tags ("group of introductory or navigational
> So you could also do:
> <p><time datetime="2012-04-30">30 April 2012</time></p>
> <p><address>John Doe</address></p>
> <li><a href="/tags/politics">politics</a>
> <li><a href="/tags/politics">environment</a>
> <p>Article bodyâ¦</p>
> Personally, I think it might be easier to understand and provide user
> agent behaviors if we to define header and footer as the header and
> footer of sections, and then require:
> [start section]
> [zero or more aside elements]
> [zero or one header element]
> [other material]
> [zero or one footer element]
> [zero or more aside elements]
> [end section]
> This way, if you hit a navigation key for footer you go to the end of a
> section, like you'd expect.
> Allowing <aside> before <header> or after <footer> is mostly a
> concession to ad publishing.
> In other words: define <header> and <footer> by their structural role
> rather than their contents per se.
On Sun, 29 Apr 2012, Maciej Stachowiak wrote:
> It may be useful to have distinctive markup to identify a byline within
> a header. But placing a <footer> element inside a <header> element does
> not seem like the most clear way to do that. I expect most authors would
> not think to use it that way, and content consumers would have a hard
> time distinguishing intentional cases of such use from authoring errors.
On Sun, 29 Apr 2012, Andrés Sanhueza wrote:
> The problem is that I don't see semantic difference into the current
> definition of <footer> and a header byline to warrant a new element or
> convention. While <header> is clearly 'introductory' stuff, <footer> is
> defined as meta info of its section, so its position on the code is not
> fixed as long it complies with that. <header> is not a section element
> so it may not cause an issue in that regard. It have been stated that
> the element names aren't determinant of their semantic, and that apply
> for previous redefined elements which were originally named on
> presentational aspects. Their new definitions gives them a semantic
> purpose, yet making them still distinct from other elements.
While it's true that the advice in the spec can lead one to wonder about
whether one should put metadata in a <footer> which should then be in a
<header>, I think, as Maciej suggests above, that that would lead to very
confusing markup. The spec doesn't _require_ that bylines be in the
<footer>. I've added a note to the spec to help clarify this.
On Mon, 30 Apr 2012, Andrés Sanhueza wrote:
> The <u> element was made conforming due to widespread usage and for some
> cases where other elements weren't suitable. However, I feel that the
> current definition is not very clear, as it gives two somewhat unrelated
> used for it: misspelled text and proper names on Chinese.
It only actually gives one use: "a span of text with an unarticulated,
though explicitly rendered, non-textual annotation". The two cases you
give are examples of such annotations.
On Wed, 2 May 2012, Shaun Moss wrote:
> I know it's contentious, but as a teacher it's very simple to teach students
> of HTML5 that:
> <u> = underline
> <b> = bold
> <i> = italic
> <s> = strikethrough
> Of course, I also teach <strong> and <em>, but the simplest way to teach
> <b> and <i> is that it's merely an easy way to create bold or italic
> text when the meaning of <strong> or <em> doesn't apply. They represent
> a convenience that spares the author the work of using span tags and
> creating a CSS class with font-weight or font-style properties. <u> is
> the same, just an easy way to create underlined text. It doesn't really
> need semantics piled on top of it - that just makes it harder to teach
> and learn. But using Chinese names or misspelled text as /examples/ of
> when to use <u> is another matter.
> I grok the desire to have all tags defined semantically, but if the
> semantic definitions add unnecessary complexity, then it just seems like
> a kludge. Anyone can understand <b> = bold.
What is bold, when you are using a screen reader, a speech synthesiser, a
sign-language interpreter, a braille display, a text mode display with no
font control, or a one-pass teletype?
On Wed, 2 May 2012, Ashley Sheridan wrote:
> I still seems more important to ask why something should be bold or
> italic. Surely getting students into the mindset of describing their
> data is more beneficial?
On Wed, 2 May 2012, Shaun Moss wrote:
> Sure, I agree - so, deprecate the <b>, <i>, <u> and <s> tags then.
All four of these, as described in HTML4, are gone.
There are three elements in HTML today that happen to have the same names
as three of those four, but they are not the same elements and do not have
the same meanings.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg