[whatwg] Various HTML element feedback

Mon Aug 27 17:20:32 PDT 2012

On Wed, 6 Jun 2012, Jukka K. Korpela wrote:
> 2012-06-06 2:53, Ian Hickson wrote:
> > > 
> > > I have rather been optimistic about future developments for markup 
> > > elements that have been defined exactly enough to warrant meaningful 
> > > semantics-based processing. For example, most of the uses mentioned 
> > > in current text imply that <var> element contents should be kept 
> > > intact in automatic language translation.
> > 
> > That continues to be the case, so I don't know why you conclude that 
> > using it is now pointless.
> 
> It is worse than pointless, if the definition of <var> covers "a term 
> used as a placeholder in prose". Such expressions should definitely not 
> be kept intact in automatic language translation.

They shouldn't be kept intact, but they still need special semantic 
processing to not break the page's meaning during translation (e.g. 
ensuring that the same variable name is always translated the same way).

> > > So why not simply define <i> recommended and describe <var>,<cite>, 
> > > <em>, and <dfn> as deprecated but supported alternatives?
> > 
> > What benefit does empty deprecation have?
> 
> Declaring some features as "obsolete" is effectively deprecation; I just 
> used the term "deprecate" as per HTML 4.01 because I find it more 
> descriptive. Anyway, defining those elements as deprecated/obsolete 
> would be no less and no more "empty" than the current statements about 
> obsolete status. Validators/checkers would issue messages (hopefully 
> just warnings) about them, and tutorials would probably describe them as 
> secondary if at all.

I don't see any benefit to obsoleting these elements. They are useful for 
various purposes. Even the HTML spec uses them (all of the above, in fact) 
to obtain special behaviour (e.g. the cross-referencing system uses 
<dfn>). In general having a variety of elements provides authors with good 
hooks for styling, too.

> Reducing alternatives, from five to one in this case, makes the 
> recommendations simpler and helps authors because they need not spend 
> time in making choices between the elements. Such choices can be tough, 
> if you try to play by the declared "semantics", especially if it is 
> vague (to a normal reader of a spec).
> 
> My point is: either make elements like <var>, <cite>, <em>, <dfn>, <i> 
> defined so that the differences can be utilized in automatic processing, 
> or just bundle them together, to <i>.

I certainly agree that we shouldn't go to a DocBook level of element 
variety, but reducing the avilable elements to a mere handful doesn't make 
any sense either. We have to strike a balance, taking into account what 
elements have historically been available (and thus which authors are 
familiar with), what use cases might argue for new ones, which elements 
have been most used or not used, etc.

> > It's not like we can ever remove these elements altogether.
> 
> Oh, in 20 or 30 years, I think browsers could support to some of them.

I'm not sure what you meant to write, but I don't see why 1992-2012 would 
be harder than 2012-2032 in terms of dropping these elements.

> > What harm do they cause?
> 
> Unnecessary complication to the language, artificial "semantics" that do 
> not actually define meanings, and confusion among those authors who try 
> to take semantics and specifications seriously. Oh, and pointless 
> variation in markup and added complexity of styling.

I disagree that these are really serious problems, or that their magnitude 
outweights the benefits here.

> > If we have to keep them, we are better served by embracing them and 
> > giving them renewed purpose and vigour, rather than being ashamed of 
> > them.
> 
> I think this summarizes well the idea behind some of the most contrived 
> "semantic" definitions. It was a brave attempt, but it failed. No normal 
> author will ever get your idea of the new meaning for <b> and <i>, for 
> example.

I guess we shall see. :-)

> And since, for example, the <font> markup needs to be supported for a 
> long time, how come *it* has not got a new, semantic definition?

I didn't start from <b> and look for a use case. People presented use 
cases, and when looking for a solution, <b> fit the bill. Same with 
<small>, etc. We did at one point have <font> in the spec, but the use 
case that supported its inclusion was later solved in a different way (I 
forget what it was) and we ended up removing it again. If a use case is 
presented for which <font> is a good fit, we can use it again.

> > > This would make authoring simpler without any real cost. There’s 
> > > little reason to tell authors to use “semantic markup” if we 
> > > don’t think it has real effect on anything.
> > 
> > It does have an effect. It has many effects. It makes maintenance 
> > easier, it makes it easier to transition from project to project, it 
> > makes it easier to work on other people's markup, it makes it 
> > significantly easier to dramatically change a site's appearance, it 
> > makes it easier to create apply custom tools to extract information 
> > from the documents, it makes it easier for search engines to guess at 
> > author intent, it makes it easier for the documents to be repurposed 
> > for other media, it makes it easier for documents to be "remixed", it 
> > makes it easier for JavaScript libraries to be used and mixed...
> 
> I've often seen such arguments, even in situations where it is 
> strikingly obvious that they don't apply. The argumentation sounds like 
> a matter of faith or principle rather practical considerations.

As far as I can see, they are practical considerations.

> Many of the arguments relate to authoring style, coding principles, and 
> organization of work, rather than something that belongs to a general 
> specification.

I disagree with the premise of this statement.

> For example, the ease of working on other people's markup in a 
> collaborative environment depends on a large number of factors, 
> including the overall structures, appearance of markup (lower vs. upper 
> case, use of quotes, omission of omissible tags, indentations, empty 
> lines), principles of choosing id and class names, use of comments, etc. 
> General specifications cannot and need not handle such issues.

Well, they could, clearly (e.g. XHTML mandates the use of lowercase 
elements names, disallows omissions of most optional tags, requires 
certain quotation styles). We are constrained in the manner in which we 
can control these things in text/html.

> The other major part of the argumentation refers to assumed automatic 
> processing. This is mostly just assumptions, or wishes, often presented 
> if they were facts. But they *could* be turned to reality, in part. This 
> is just the reason why I have asked for semantic clarifications. No one 
> can reasonably base automatic processing on definitions like those for 
> <var>, <b>, etc. now.

They are reality in many cases. As noted above, the HTML spec itself uses 
the semantics of a number of HTML elements (e.g. <h1> and <dfn>) to 
perform automated processing (e.g. the ToC and the xrefs).

> > > What is _compelling_ about markup for misspellings?
> > 
> > It's a feature that is necessary in text editors, for which we 
> > previously did not have a good solution.
> 
> I would not call it a solution to say that the <b> markup, which 
> actually means bold face to any existing relevant software, should be 
> used for specialized meanings. How could anyone, or any software, 
> reading markup guess whether <b> means "misspelling", or "Chinese name", 
> or some entirely different "unarticulated, though explicitly rendered, 
> non-textual annotation"? Such things can be resolved via classes, to 
> some extent, but then the artificial "semantic" definition for <b> is 
> pointless.

<b> means "a span of text to which attention is being drawn for 
utilitarian purposes without conveying any extra importance and with no 
implication of an alternate voice or mood", not "misspelling" or "Chinese 
name". I assume you meant <u>, in which case it means "a span of text with 
an unarticulated, though explicitly rendered, non-textual annotation", 
which does, as the spec points out, describe both "misspelling" and 
"Chinese name". I'm not aware of a use case that needs to distinguish 
these that isn't also the software generating them, so the semantic 
doesn't need to be any more precise.

In any case, no software can reliably guess whether <h1>, as used in an 
actual page, actually means "heading" or "large text in a poem" or 
"grafiti" or "warning". People misuse elements, it doesn't mean that 
having semantics is a lost cause. Having elements with somewhat vague 
semantics is the same thing.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'