[whatwg] Various HTML element feedback
Jukka K. Korpela
jkorpela at cs.tut.fi
Tue Jun 5 21:39:40 PDT 2012
2012-06-06 2:53, Ian Hickson wrote:
>> I have rather been optimistic about future developments for markup
>> elements that have been defined exactly enough to warrant meaningful
>> semantics-based processing. For example, most of the uses mentioned in
>> current text imply that <var> element contents should be kept intact in
>> automatic language translation.
> That continues to be the case, so I don't know why you conclude that using
> it is now pointless.
It is worse than pointless, if the definition of <var> covers "a term
used as a placeholder in prose". Such expressions should definitely not
be kept intact in automatic language translation.
The definition of <var> is so broad that it is questionable whether
*anything* useful can be assumed in automated processing. If it were
defined more technically, without that placeholder idea, we could fairly
certainly say that the content should be treated as a technical notation
that should be left untranslated (as such notations are normally
international), ignored in spelling checks, treated as equivalent to
unknown nouns in syntax analysis of human language text, etc.
>> So why not simply define <i> recommended and describe <var>,<cite>,
>> <em>, and <dfn> as deprecated but supported alternatives?
> What benefit does empty deprecation have?
Declaring some features as "obsolete" is effectively deprecation; I just
used the term "deprecate" as per HTML 4.01 because I find it more
descriptive. Anyway, defining those elements as deprecated/obsolete
would be no less and no more "empty" than the current statements about
obsolete status. Validators/checkers would issue messages (hopefully
just warnings) about them, and tutorials would probably describe them as
secondary if at all.
Reducing alternatives, from five to one in this case, makes the
recommendations simpler and helps authors because they need not spend
time in making choices between the elements. Such choices can be tough,
if you try to play by the declared "semantics", especially if it is
vague (to a normal reader of a spec).
My point is: either make elements like <var>, <cite>, <em>, <dfn>, <i>
defined so that the differences can be utilized in automatic processing,
or just bundle them together, to <i>.
> It's not like we can ever remove
> these elements altogether.
Oh, in 20 or 30 years, I think browsers could support to some of them.
> What harm do they cause?
Unnecessary complication to the language, artificial "semantics" that do
not actually define meanings, and confusion among those authors who try
to take semantics and specifications seriously. Oh, and pointless
variation in markup and added complexity of styling.
> If we have to keep them, we are better served by embracing them and giving
> them renewed purpose and vigour, rather than being ashamed of them.
I think this summarizes well the idea behind some of the most contrived
"semantic" definitions. It was a brave attempt, but it failed. No normal
author will ever get your idea of the new meaning for <b> and <i>, for
And since, for example, the <font> markup needs to be supported for a
long time, how come *it* has not got a new, semantic definition?
If <var>, <cite>, <em>, <dfn> would be obsoleted/deprecated in favor of
<i>, they would still need to be defined in the spec, of course. But the
definition could simply state that they are outdated elements that
should not be used by authors and should be treated by browsers as
equivalent to <i>.
>> This would make authoring simpler without any real cost. There’s
>> little reason to tell authors to use “semantic markup” if we don’t
>> think it has real effect on anything.
> It does have an effect. It has many effects. It makes maintenance easier,
> it makes it easier to transition from project to project, it makes it
> easier to work on other people's markup, it makes it significantly easier
> to dramatically change a site's appearance, it makes it easier to create
> apply custom tools to extract information from the documents, it makes it
> easier for search engines to guess at author intent, it makes it easier
> for the documents to be repurposed for other media, it makes it easier for
> be used and mixed...
I've often seen such arguments, even in situations where it is
strikingly obvious that they don't apply. The argumentation sounds like
a matter of faith or principle rather practical considerations.
Many of the arguments relate to authoring style, coding principles, and
organization of work, rather than something that belongs to a general
specification. For example, the ease of working on other people's markup
in a collaborative environment depends on a large number of factors,
including the overall structures, appearance of markup (lower vs. upper
case, use of quotes, omission of omissible tags, indentations, empty
lines), principles of choosing id and class names, use of comments, etc.
General specifications cannot and need not handle such issues. And, say,
the use of <b> vs. <strong>, given their current definitions, is quite
comparable to regulating the use of class attributes.
The other major part of the argumentation refers to assumed automatic
processing. This is mostly just assumptions, or wishes, often presented
if they were facts. But they *could* be turned to reality, in part. This
is just the reason why I have asked for semantic clarifications. No one
can reasonably base automatic processing on definitions like those for
<var>, <b>, etc. now.
Let legacy be legacy, instead of trying to convert it to "semantics".
The semantics of physical markup is the visual appearance. It is best to
describe it simply and openly (and accurately - for example, what <i>
really means in legacy markup, and will mean in browsers in the
foreseeable future, is italic *or* oblique *or* algorithmically slanted
>> What is _compelling_ about markup for misspellings?
> It's a feature that is necessary in text editors, for which we previously
> did not have a good solution.
I would not call it a solution to say that the <b> markup, which
actually means bold face to any existing relevant software, should be
used for specialized meanings. How could anyone, or any software,
reading markup guess whether <b> means "misspelling", or "Chinese name",
or some entirely different "unarticulated, though explicitly rendered,
non-textual annotation"? Such things can be resolved via classes, to
some extent, but then the artificial "semantic" definition for <b> is
More information about the whatwg