[whatwg] Encodings and the web

NARUSE, Yui naruse at airemix.jp
Sun Jan 8 07:20:45 PST 2012


(2012/01/08 23:32), Anne van Kesteren wrote:
> On Sun, 08 Jan 2012 01:37:14 +0100, NARUSE, Yui <naruse at airemix.jp> wrote:
>> = Legacy multi-octet Chinese (traditional) encodings
>>
>> Mozilla supports another Big5 variants, Big5-UAO.
>> http://bugs.ruby-lang.org/issues/1784
> 
> As part of the big5 encoding, right? It sounds like it's a good idea to adopt that. I don't think there's much concern about table size these days, though obviously the less complexity the better.

CC to the original reporter.
Could you cooperate about current situation in Taiwan?

>> == iso-2022-jp
>> === The to Unicode algorithm
>> ==== Based on iso-2022-jp state
>> ===== ASCII state
>> ====== Based on octet:
>> ======= Otherwise
>>> If the fatal flag is set, return failure.
>>> Otherwise, emit the fallback code point.
>>
>> Just FYI, IE and Opera show these bytes as Katakana.
>> If octet is greater than 0xA0 and less than 0xE0, value is octet + 0xFEC0.
>>
>> Moreover IE shows any shift_jis characters here.
>> It seems that IE uses the same converter both iso-2022-jp and shift_jis.
> 
> I have filed a bug on Opera to become more strict like Webkit/Gecko. If there is some evidence that approach is wrong though, we can turn it around.

There is a old variant of ISO-2022-JP called "JIS8".
JIS8 is used before RFC1468 is written, and still used in some area,
for exapmle bank-to-bank information exchange.
JIS8's "8" means 8bit byte to express Katakana, which is just described above.

So I can't state it is a bug on Opera at this time.
It is depend on how many sites uses such 8bit Katakana.

-- 
NARUSE, Yui  <naruse at airemix.jp>



More information about the whatwg mailing list