[whatwg] Entity parsing [trema/diaeresis vs umlaut]

Henri Sivonen hsivonen at iki.fi
Thu Jun 28 05:03:21 PDT 2007

On Jun 28, 2007, at 14:51, K?i?tof ?elechovski wrote:

> I admit that the fact that the ligature œ is not
> included in the character set (and, consequently, that the  
> character set
> ISO-8859-1 cannot be used for encoding French text, which I find  
> kind of
> stunning because of the popularity of the French language) provides  
> a much
> simpler explanation to the observable phenomenon.

This discussion is not relevant to the WHATWG or HTML5. HTML5 is  
defined in terms of Unicode and Unicode covers both English and  
French (and quite a bit more). Anyone is free to use all that  
expressiveness straight by encoding documents as UTF-8.

Entities or legacy encodings don't add any expressiveness. They just  
expand to Unicode. The details of how this is handled is constrained  
by legacy—not by political correctness.

P.S. Before anyone slaps me for being politically incorrect or  
insensitive, I'd like to point out that my native language uses  
characters whose entity names are biased towards German terminology.  
But this isn't a slightest technical problem. Let's move on.

Henri Sivonen
hsivonen at iki.fi

