[whatwg] HTML: A DOM attribute that returns the language of a node

Jukka K. Korpela jkorpela at cs.tut.fi
Thu Aug 8 07:29:31 PDT 2013

2013-08-08 2:57, Ryosuke Niwa wrote:

> On Aug 2, 2013, at 6:10 AM, Jukka K. Korpela <jkorpela at cs.tut.fi>
> wrote:
>> But regarding the effect of language markup on fonts, the effect is
>> limited to situations where the font is not specified in a style
>> sheet. This is a rather uncommon scenario these days; authors are
>> more than eager to set fonts.
> Do you have actual statistics to support this point?

No, it’s just an impression from looking at numerous pages and their 
coding as well as views presented in authors’ forums.

> As far as I
> checked, neither baidu.com nor yahoo.com.tw seems to explicitly
> specify a Chinese font.

They both have font-family settings, slightly different, but basically 
the most common (sorry, no statistic on this either) setup that uses 
Arial (possibly with Helvetica as second option, which does not change 
much). So, granted, they don’t specify a Chinese font in the sense of 
including any specific fonts containing CJK characters in the 
font-family list.

Baidu doesn’t set lang either, so they seem to be accepting, for any 
characters not covered by Arial, whatever happens to be in each 
browser’s list of fallback fonts, when no information about content 
language is available. Yahoo.com.tw sets lang="zh-tw", so they do care, 
but only to the extent that the fallback font should be one intended for 
Traditional Chinese.

So the lang markup may affect fonts, but only under some conditions. And 
if you care about fonts, as an author, then an explicit list of font 
alternatives has better chances of creating the desired result.

>> It is true that they might specify a font list where none of the
>> fonts supports some characters that might be entered, and then a
>> fallback font would be used. However, using “annotations”
>> (presumably, lang attributes, along with extra <span> elements when
>> needed) does not sound like a feasible approach to this.
> Whether it’s feasible or not, that’s what we have been doing due to
> the Han unification.  If we could, we’ll undo the Han unification and
> use different glyphs for each character but we can’t do that at this
> point in time.

If a page contains texts to be rendered using different forms 
(Traditional Chinese, Simplified Chinese, Japanese, Korean) for Han 
characters, you will need to control the rendering somehow. Using lang 
markup might be theoretically most adequate, but it’s indirect and less 
effective than just setting different fonts (via font-family lists that 
contain reasonably many alternatives).

But even if lang attributes are used, I don’t think the issue has much 
relevance to the original question here. A DOM attribute that returns 
the language of a node would be useful for the purpose only if you 
intend to affect rendering via JavaScript.


More information about the whatwg mailing list