[whatwg] Is EBCDIC support needed for not breaking the Web?

Henri Sivonen hsivonen at iki.fi
Sun Jun 1 06:45:26 PDT 2008


The HTML5 draft says that authors should not use EBCDIC-based  
encodings. This is more lax than saying that authors must not use and  
user agents must not support CESU-8, UTF-7, BOCU-1 and SCSU.

In general, now that UTF-8 exists and is ubiquitously supported,  
proliferation of encodings is costly and doesn't expand that the  
expressiveness of HTML which is parsed into a Unicode DOM anyway.  
Moreover, encodings that are not ASCII supersets are potential  
security risks since the string "<script>" may be represented by  
different bytes than in ASCII leading to potential privilege  
escalation if a server-side gatekeeper and a user agent give different  
meanings to the bytes.

For these reasons, if EBCDIC-based encodings don't need to be  
supported in order to Support Existing Content, it would be beneficial  
never to add support for them and, thus, ban them like CESU-8, UTF-7,  
BOCU-1 and SCSU.

I asked Hixie for examples of sites or browsers that require/support  
EBCDIC-based encodings. He had none. I examined the encoding menus of  
Firefox 3b5, Safari 3.1 and Opera 9.5 beta (on Leopard) and IE8 beta 1  
(on English XP SP3). None of them expose EBCDIC-based encodings in the  
UI. (All the IBM encodings Firefox exposes turn out to be ASCII-based.)

This makes me wonder: Do the top browsers support any EBCDIC-based  
encodings but just without exposing them in the UI? If not, can there  
be any notable EBCDIC-based Web content?

I'm suspecting that EBCDIC isn't actually a Web-relevant.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/





More information about the whatwg mailing list