[whatwg] HTML: A DOM attribute that returns the language of a node

Ryosuke Niwa rniwa at apple.com
Mon Aug 12 16:50:29 PDT 2013

On Aug 8, 2013, at 7:29 AM, Jukka K. Korpela <jkorpela at cs.tut.fi> wrote:

> 2013-08-08 2:57, Ryosuke Niwa wrote:
>> On Aug 2, 2013, at 6:10 AM, Jukka K. Korpela <jkorpela at cs.tut.fi>
>> wrote:
> [...]
>>> But regarding the effect of language markup on fonts, the effect is
>>> limited to situations where the font is not specified in a style
>>> sheet. This is a rather uncommon scenario these days; authors are
>>> more than eager to set fonts.
>> Do you have actual statistics to support this point?
> No, it’s just an impression from looking at numerous pages and their coding as well as views presented in authors’ forums.
>> As far as I
>> checked, neither baidu.com nor yahoo.com.tw seems to explicitly
>> specify a Chinese font.
> They both have font-family settings, slightly different, but basically the most common (sorry, no statistic on this either) setup that uses Arial (possibly with Helvetica as second option, which does not change much). So, granted, they don’t specify a Chinese font in the sense of including any specific fonts containing CJK characters in the font-family list.
> Baidu doesn’t set lang either, so they seem to be accepting, for any characters not covered by Arial, whatever happens to be in each browser’s list of fallback fonts, when no information about content language is available. Yahoo.com.tw sets lang="zh-tw", so they do care, but only to the extent that the fallback font should be one intended for Traditional Chinese.
> So the lang markup may affect fonts, but only under some conditions. And if you care about fonts, as an author, then an explicit list of font alternatives has better chances of creating the desired result.

That's not a practical solution because we can't possibly know the list of Chinese & Japanese fonts available by default in all operating systems.

>>> It is true that they might specify a font list where none of the
>>> fonts supports some characters that might be entered, and then a
>>> fallback font would be used. However, using “annotations”
>>> (presumably, lang attributes, along with extra <span> elements when
>>> needed) does not sound like a feasible approach to this.
>> Whether it’s feasible or not, that’s what we have been doing due to
>> the Han unification.  If we could, we’ll undo the Han unification and
>> use different glyphs for each character but we can’t do that at this
>> point in time.
> If a page contains texts to be rendered using different forms (Traditional Chinese, Simplified Chinese, Japanese, Korean) for Han characters, you will need to control the rendering somehow. Using lang markup might be theoretically most adequate, but it’s indirect and less effective than just setting different fonts (via font-family lists that contain reasonably many alternatives).

Controlling the rendering isn't the goal here.  The point is to use the correct glyph in each language so that each character is recognizable by users.  Again, specifying a font name is not a practical solution as authors have no way of knowing the list of Chinese & Japanese fonts provided by all current and future operating systems.

> But even if lang attributes are used, I don’t think the issue has much relevance to the original question here. A DOM attribute that returns the language of a node would be useful for the purpose only if you intend to affect rendering via JavaScript.

No.  The point is that any code that attempts to move or copy contents must preserve the effective value of the lang attribute.

- R. Niwa

More information about the whatwg mailing list