[whatwg] Semantics in HTML
Henri Sivonen
hsivonen at iki.fi
Sat Nov 4 06:40:30 PST 2006
On Nov 2, 2006, at 00:17, Anne van Kesteren wrote:
> On Wed, 01 Nov 2006 20:55:58 +0100, James Graham <jg307 at cam.ac.uk>
> wrote:
>> To take a slight detour into the (hopefully not too) abstract,
>> what do people think the fundamental point of semantics in HTML is?
I think the fundamental point is allowing programmatic processing of
documents in ways that are *useful* and that semantic markup makes
*practical* but that would be considerably less practical with
presentation-based heuristics *and* enabling the processing without
those wanting to do it having to negotiate with the author (or
enabling the author to get off-the-shelf software for processing his/
her own documents).
Rendering for media different from the author's primary target is
such processing done in software controlled by others.
Indexing documents and taking extracts for display in search results
is such processing done in software controlled by others.
Generating a table of contents could be a case of the author wanting
to get off-the-shelf software that works with his/her own documents.
So I think the merit of semantic elements in HTML should not be
judged in terms of the willingness of semanticists to express stuff
but instead the merit should be judged against the willingness of
software developers to write software that consumes the expression
for a useful purpose and the whether authors in general are
incentivized to support such processing (either knowingly or as a
side effect of accomplishing other goals).
> Those elements should then not have any presentational aspect
Why not?
To serve media-independent presentation, having reasonable
presentations for different media is more useful than having a
semantic definition.
(What kinds of different media there can be is limited by how you can
deliver data into a human. In the absence of direct-to-brain
transfers, you are in practice limited to visual, aural and tactile
media.)
> We probably don't want things like:
>
> <sci-fi-serie-title>Stargate Atlantis</sci-fi-serie-title>
>
>
> Although I suppose that at some point you do want to able to
> express the latter.
I think we should not care if someone wants to *express* it unless
there is notable practical interest in *consuming* the expression.
(Not "would be cool" interest but "would write software" interest.)
>> Henri has been talking about the possibility of making HTML5 more
>> "semantically lax", and here Anne is interested in where it is not
>> "semantically pure", presumably with a desire to fixing it.
My point is that if the semantics for a given element are not precise
enough or authors aren't incentivized to use them properly so that
non-presentation use of the semantics becomes impossible or
prohibitively impractical, what is left is use for media-independent
rendering and at that point it is enough define the element in terms
of default presentation or, if the element doesn't have a
distinguishing default presentation, not include the element.
Example with existing markup:
<dl> has a well-understood default presentation (at least for visual
media), but on the real Web, it doesn't have precise enough semantics
to allow heuristic-free reasoning such as compiling a search database
of definitions for words by scraping the Web. Yet, <dl> is useful for
achieving a particular kind of organization of pieces of text (list
of items where the items have an inline label and a block of text) in
a backwards-compatible way that works even in unstyled HTML.
Therefore, it is useful to have <dl> around as a media-independent
grouping device that doesn't have profound semantics.
Example against introducing new markup:
In discussions where <i> is assumed to be axiomatically evil and
semantic alternatives are sought, it often comes up that in text
discussing biology the taxonomical Latin names of organisms are
italicized. Should HTML have an element for marking up a piece of
text as a biological taxonomical name? I say no. For data mining
(including search engines) it is easier to compile a list of known
taxonomical names and compare strings against that list than to
badger every biologist to use the semantic element. As for
presentation, <i> works just fine. The effects of <i> on aural or
tactile media probably won't be so bad that most authors would be
willing to take special steps. For authors themselves getting off-the-
shelf software that does useful things, the case is probably too
specific and lacks processing use cases to create a market. However,
what authors might want to do is to use the taxonomical names as
terms in an index in print. However, for that use case to cover
different kinds of text with index terms, you'd want something more
generic than markup for biological taxonomical names. (An index is
not needed for interactive screen media, because you can search for
any string anyway.)
>> [...] I also don't know which view best fits my position because I
>> don't really understand what people are trying to achieve with
>> (the markup in) HTML -- I think there are things I would change in
>> the current draft, but there seems little point talking about
>> which markup elements should or shouldn't exist without having
>> some overall framework against which the merit of various
>> proposals can be measured.
+1.
--
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
More information about the whatwg
mailing list