[whatwg] Entity parsing [trema/diaeresis vs umlaut]
Henri Sivonen
hsivonen at iki.fi
Thu Jun 28 05:03:21 PDT 2007
On Jun 28, 2007, at 14:51, K?i?tof ?elechovski wrote:
> I admit that the fact that the ligature œ is not
> included in the character set (and, consequently, that the
> character set
> ISO-8859-1 cannot be used for encoding French text, which I find
> kind of
> stunning because of the popularity of the French language) provides
> a much
> simpler explanation to the observable phenomenon.
This discussion is not relevant to the WHATWG or HTML5. HTML5 is
defined in terms of Unicode and Unicode covers both English and
French (and quite a bit more). Anyone is free to use all that
expressiveness straight by encoding documents as UTF-8.
Entities or legacy encodings don't add any expressiveness. They just
expand to Unicode. The details of how this is handled is constrained
by legacy—not by political correctness.
P.S. Before anyone slaps me for being politically incorrect or
insensitive, I'd like to point out that my native language uses
characters whose entity names are biased towards German terminology.
But this isn't a slightest technical problem. Let's move on.
--
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
More information about the whatwg
mailing list