[html5] Usefulness of language annotations
Jukka K. Korpela
jukka.k.korpela at kolumbus.fi
Mon Aug 11 10:38:20 PDT 2014
2014-08-11 20:09, Jens O. Meiert wrote:
> Hence I want to fish for arguments here: How useful are language
> annotations via @lang?
They are of rather limited usefulness in practice. There has been much
talk about them in theory for several years, but it has mostly remained
theory. They are recommended in WCAG, but few actual benefits have been
cited.
I think HTML specifications should just describe @lang as it has been
defined, as declarative markup, without recommending or discourageing
its use. The reason is that its real usefulness is a separate issue that
varies according to what browsers, search engines, and other software
actually do with it. And this probably depends on things external to
HTML specifications.
> 1) Do user agents, including assistive technology, use this
> information in a way that is *actually* relevant and meaningful to the
> user?
They do, to some extent.
> 2) Isn’t, or shouldn’t, language determination primarily be made a
> user agent, and not a developer responsibility?
At the logical level, specifying content language is the author’s (or
“developer’s”) responsibility. To take this to the extreme, consider an
HTML document where the only text content is the word “hat”. The entire
meaning depends on the intended language. If you declare, say, lang=sv,
the content means “hate”; if lang=de, it means “has”. If this sounds
contrived, consider an HTML document consisting just of an image and its
caption, which can be very short.
In practice, browsers don’t try to determine content language from the
content itself. Some search engines do. I think it is well known that
Google ignores @lang, because it is so often just wrong (e.g., lang=”en”
emitted by authoring software, with no regard to the actual content
language), and can usually guess the language from a sentence or two
pretty well. It sometimes makes mistakes, e.g. taking Norwegian for
Danish or vice versa, or Slovak for Czech or vice versa, but the
important thing is that it works for the vast majority of cases.
> 3) Does it matter at all?
For a document as a whole, automatic language guessing works
sufficiently well. When it does not (e.g. in a speech browser with no
such guessing), the user needs to select the reading mode manually.
Inconvenient, but usually not a big issue
The specific situation where @lang might matter is change of language in
a multilingual document. Language guessing is easily misled if there are
short quotations in other languages. You can see this if you use
Microsoft Word (which has a good language guesser) and write in
different languages in the same document, with language guessing
enabled. Word usually guesses right, except for short fragments in
another languages, and it may interpret the exact location of language
change. So language markup could help. The problem is that it is very
tedious to produce, and the potential gain is rather small now, and in
the foreseeable future.
--
Yucca, http://www.cs.tut.fi/~jkorpela/
More information about the Help
mailing list