[html5] r4282 - [a] (0) discourage use of HZ-GB-2312; explain why.

whatwg at whatwg.org whatwg at whatwg.org
Thu Oct 22 20:13:39 PDT 2009


Author: ianh
Date: 2009-10-22 20:13:34 -0700 (Thu, 22 Oct 2009)
New Revision: 4282

Modified:
   complete.html
   index
   source
Log:
[a] (0) discourage use of HZ-GB-2312; explain why.

Modified: complete.html
===================================================================
--- complete.html	2009-10-23 02:34:24 UTC (rev 4281)
+++ complete.html	2009-10-23 03:13:34 UTC (rev 4282)
@@ -11888,12 +11888,13 @@
   <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a>.</p>
 
   <p>Authors should not use JIS-X-0208 <!-- x-JIS0208 -->
-  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on
-  ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422
-  -->, and encodings based on EBCDIC. Authors should not use
-  UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
-  encodings.
+  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), HZ-GB-2312<!-- has
+  crazy handling of ASCII "~" -->, encodings based on ISO-2022<!--
+  http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 -->, and
+  encodings based on EBCDIC. Authors should not use UTF-32.
+  Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings.
   <a href=#refsRFC1345>[RFC1345]</a><!-- for the JIS types -->
+  <a href=#refsRFC1842>[RFC1842]</a><!-- HZ-GB-2312 -->
   <a href=#refsRFC1468>[RFC1468]</a><!-- ISO-2022-JP -->
   <a href=#refsRFC2237>[RFC2237]</a><!-- ISO-2022-JP-1 -->
   <a href=#refsRFC1554>[RFC1554]</a><!-- ISO-2022-JP-2 -->
@@ -11907,8 +11908,18 @@
   <!-- no idea what to reference for EBCDIC, so... -->
   </p>
 
+  <p class=note>Most of these encodings are discouraged because of
+  security concerns. If a hostile user can contribute text to a site
+  using these encodings, bugs in the site's whitelisting filter or in
+  a user agent can easily lead to the filter interpreting the
+  contribution as "safe" while the user agent interprets the same
+  contribution as containing a <code><a href=#script>script</a></code> element. This would
+  enable cross-site scripting attacks. By avoiding these encodings,
+  and always providing a <a href=#character-encoding-declaration>character encoding declaration</a>,
+  an author is less likely to run into this kind of problem.</p>
+
   <p>Authors are encouraged to use UTF-8. Conformance checkers may
-  advise against authors using legacy encodings.</p>
+  advise authors against using legacy encodings.</p>
 
   <div class=impl>
 
@@ -86522,6 +86533,13 @@
    Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF,
    December 1993.</dd>
 
+   <dt id=refsRFC1842>[RFC1842]</dt>
+
+   <dd><cite><a href=http://www.ietf.org/rfc/rfc1842.txt>ASCII
+   Printable Characters-Based Chinese Character Encoding for Internet
+   Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang.
+   IETF, August 1995.</dd>
+
    <dt id=refsRFC1922>[RFC1922]</dt>
    <dd><cite><a href=http://www.ietf.org/rfc/rfc1922.txt>Chinese Character
    Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao,

Modified: index
===================================================================
--- index	2009-10-23 02:34:24 UTC (rev 4281)
+++ index	2009-10-23 03:13:34 UTC (rev 4282)
@@ -11718,12 +11718,13 @@
   <a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a>.</p>
 
   <p>Authors should not use JIS-X-0208 <!-- x-JIS0208 -->
-  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on
-  ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422
-  -->, and encodings based on EBCDIC. Authors should not use
-  UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
-  encodings.
+  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), HZ-GB-2312<!-- has
+  crazy handling of ASCII "~" -->, encodings based on ISO-2022<!--
+  http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 -->, and
+  encodings based on EBCDIC. Authors should not use UTF-32.
+  Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings.
   <a href=#refsRFC1345>[RFC1345]</a><!-- for the JIS types -->
+  <a href=#refsRFC1842>[RFC1842]</a><!-- HZ-GB-2312 -->
   <a href=#refsRFC1468>[RFC1468]</a><!-- ISO-2022-JP -->
   <a href=#refsRFC2237>[RFC2237]</a><!-- ISO-2022-JP-1 -->
   <a href=#refsRFC1554>[RFC1554]</a><!-- ISO-2022-JP-2 -->
@@ -11737,8 +11738,18 @@
   <!-- no idea what to reference for EBCDIC, so... -->
   </p>
 
+  <p class=note>Most of these encodings are discouraged because of
+  security concerns. If a hostile user can contribute text to a site
+  using these encodings, bugs in the site's whitelisting filter or in
+  a user agent can easily lead to the filter interpreting the
+  contribution as "safe" while the user agent interprets the same
+  contribution as containing a <code><a href=#script>script</a></code> element. This would
+  enable cross-site scripting attacks. By avoiding these encodings,
+  and always providing a <a href=#character-encoding-declaration>character encoding declaration</a>,
+  an author is less likely to run into this kind of problem.</p>
+
   <p>Authors are encouraged to use UTF-8. Conformance checkers may
-  advise against authors using legacy encodings.</p>
+  advise authors against using legacy encodings.</p>
 
   <div class=impl>
 
@@ -77700,6 +77711,13 @@
    Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF,
    December 1993.</dd>
 
+   <dt id=refsRFC1842>[RFC1842]</dt>
+
+   <dd><cite><a href=http://www.ietf.org/rfc/rfc1842.txt>ASCII
+   Printable Characters-Based Chinese Character Encoding for Internet
+   Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang.
+   IETF, August 1995.</dd>
+
    <dt id=refsRFC1922>[RFC1922]</dt>
    <dd><cite><a href=http://www.ietf.org/rfc/rfc1922.txt>Chinese Character
    Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao,

Modified: source
===================================================================
--- source	2009-10-23 02:34:24 UTC (rev 4281)
+++ source	2009-10-23 03:13:34 UTC (rev 4282)
@@ -12379,12 +12379,13 @@
   <span>ASCII-compatible character encoding</span>.</p>
 
   <p>Authors should not use JIS-X-0208 <!-- x-JIS0208 -->
-  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on
-  ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422
-  -->, and encodings based on EBCDIC. Authors should not use
-  UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
-  encodings.
+  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), HZ-GB-2312<!-- has
+  crazy handling of ASCII "~" -->, encodings based on ISO-2022<!--
+  http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 -->, and
+  encodings based on EBCDIC. Authors should not use UTF-32.
+  Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings.
   <a href="#refsRFC1345">[RFC1345]</a><!-- for the JIS types -->
+  <a href="#refsRFC1842">[RFC1842]</a><!-- HZ-GB-2312 -->
   <a href="#refsRFC1468">[RFC1468]</a><!-- ISO-2022-JP -->
   <a href="#refsRFC2237">[RFC2237]</a><!-- ISO-2022-JP-1 -->
   <a href="#refsRFC1554">[RFC1554]</a><!-- ISO-2022-JP-2 -->
@@ -12398,8 +12399,18 @@
   <!-- no idea what to reference for EBCDIC, so... -->
   </p>
 
+  <p class="note">Most of these encodings are discouraged because of
+  security concerns. If a hostile user can contribute text to a site
+  using these encodings, bugs in the site's whitelisting filter or in
+  a user agent can easily lead to the filter interpreting the
+  contribution as "safe" while the user agent interprets the same
+  contribution as containing a <code>script</code> element. This would
+  enable cross-site scripting attacks. By avoiding these encodings,
+  and always providing a <span>character encoding declaration</span>,
+  an author is less likely to run into this kind of problem.</p>
+
   <p>Authors are encouraged to use UTF-8. Conformance checkers may
-  advise against authors using legacy encodings.</p>
+  advise authors against using legacy encodings.</p>
 
   <div class="impl">
 
@@ -95692,6 +95703,13 @@
    Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF,
    December 1993.</dd>
 
+   <dt id="refsRFC1842">[RFC1842]</dt>
+
+   <dd><cite><a href="http://www.ietf.org/rfc/rfc1842.txt">ASCII
+   Printable Characters-Based Chinese Character Encoding for Internet
+   Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang.
+   IETF, August 1995.</dd>
+
    <dt id="refsRFC1922">[RFC1922]</dt>
    <dd><cite><a href="http://www.ietf.org/rfc/rfc1922.txt">Chinese Character
    Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao,




More information about the Commit-Watchers mailing list