[whatwg] Drop UTF-32
Michael Day
mikeday at yeslogic.com
Tue May 15 02:35:50 PDT 2007
Hi,
Suggestion: drop UTF-32 from the character encoding detection section of
HTML5, and even better, discourage of forbid user agents from
implementing support for UTF-32.
Why:
- It's not widely used. In fact, has UTF-32 ever been used at all,
outside of test suites?
- It's not widely implemented. For example, the expat XML parser does
not support it, and nobody cares.
- When it is supported, people get it wrong, and the bugs are never
fixed because no one uses UTF-32 anyway and no one cares.
For an example of this, see html5lib 0.9, which implements the BOM
detection algorithm, but gets it wrong by checking for UTF-16 before
checking for UTF-32. Because the UTF-16 BOM (FF FE) is a substring of
the UTF-32 BOM (FF FE 00 00) the test will always succeed and UTF-32
will always be misidentified as UTF-16. But no one cares, as no one uses
UTF-32 anyway.
- UTF-32 is horrendously inefficient for just about all real world
text and its use should not be encouraged on the web. Really, UTF-32
only exists as a tutorial example of how UNICODE can be encoded, not as
a practical character encoding that people should actually use.
Please, drop UTF-32 and save implementors from worrying about it when no
one uses it and no one should use it.
Thanks,
Michael
--
Print XML with Prince!
http://www.princexml.com
More information about the whatwg
mailing list