[html5] r3772 - [e] (0) Move the character encoding stuff down to the HTML syntax section since [...]

whatwg at whatwg.org whatwg at whatwg.org
Tue Sep 8 23:47:18 PDT 2009


Author: ianh
Date: 2009-09-08 23:47:17 -0700 (Tue, 08 Sep 2009)
New Revision: 3772

Modified:
   index
   source
Log:
[e] (0) Move the character encoding stuff down to the HTML syntax section since we don't want to override XML here.

Modified: index
===================================================================
--- index	2009-09-09 05:31:55 UTC (rev 3771)
+++ index	2009-09-09 06:47:17 UTC (rev 3772)
@@ -223,24 +223,23 @@
      <li><a href=#concept-http-equivalent><span class=secno>2.6.1 </span>Protocol concepts</a></li>
      <li><a href=#encrypted-http-and-related-security-concerns><span class=secno>2.6.2 </span>Encrypted HTTP and related security concerns</a></li>
      <li><a href=#content-type-sniffing><span class=secno>2.6.3 </span>Determining the type of a resource</a></ol></li>
-   <li><a href=#character-encodings-0><span class=secno>2.7 </span>Character encodings</a></li>
-   <li><a href=#common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</a>
+   <li><a href=#common-dom-interfaces><span class=secno>2.7 </span>Common DOM interfaces</a>
     <ol>
-     <li><a href=#reflecting-content-attributes-in-idl-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in IDL attributes</a></li>
-     <li><a href=#collections-0><span class=secno>2.8.2 </span>Collections</a>
+     <li><a href=#reflecting-content-attributes-in-idl-attributes><span class=secno>2.7.1 </span>Reflecting content attributes in IDL attributes</a></li>
+     <li><a href=#collections-0><span class=secno>2.7.2 </span>Collections</a>
       <ol>
-       <li><a href=#htmlcollection-0><span class=secno>2.8.2.1 </span>HTMLCollection</a></li>
-       <li><a href=#htmlallcollection-0><span class=secno>2.8.2.2 </span>HTMLAllCollection</a></li>
-       <li><a href=#htmlformcontrolscollection-0><span class=secno>2.8.2.3 </span>HTMLFormControlsCollection</a></li>
-       <li><a href=#htmloptionscollection-0><span class=secno>2.8.2.4 </span>HTMLOptionsCollection</a></li>
-       <li><a href=#htmlpropertycollection-0><span class=secno>2.8.2.5 </span>HTMLPropertyCollection</a></ol></li>
-     <li><a href=#domtokenlist-0><span class=secno>2.8.3 </span>DOMTokenList</a></li>
-     <li><a href=#domsettabletokenlist-0><span class=secno>2.8.4 </span>DOMSettableTokenList</a></li>
-     <li><a href=#safe-passing-of-structured-data><span class=secno>2.8.5 </span>Safe passing of structured data</a></li>
-     <li><a href=#domstringmap-0><span class=secno>2.8.6 </span>DOMStringMap</a></li>
-     <li><a href=#dom-feature-strings><span class=secno>2.8.7 </span>DOM feature strings</a></li>
-     <li><a href=#exceptions><span class=secno>2.8.8 </span>Exceptions</a></li>
-     <li><a href=#garbage-collection><span class=secno>2.8.9 </span>Garbage collection</a></ol></ol></li>
+       <li><a href=#htmlcollection-0><span class=secno>2.7.2.1 </span>HTMLCollection</a></li>
+       <li><a href=#htmlallcollection-0><span class=secno>2.7.2.2 </span>HTMLAllCollection</a></li>
+       <li><a href=#htmlformcontrolscollection-0><span class=secno>2.7.2.3 </span>HTMLFormControlsCollection</a></li>
+       <li><a href=#htmloptionscollection-0><span class=secno>2.7.2.4 </span>HTMLOptionsCollection</a></li>
+       <li><a href=#htmlpropertycollection-0><span class=secno>2.7.2.5 </span>HTMLPropertyCollection</a></ol></li>
+     <li><a href=#domtokenlist-0><span class=secno>2.7.3 </span>DOMTokenList</a></li>
+     <li><a href=#domsettabletokenlist-0><span class=secno>2.7.4 </span>DOMSettableTokenList</a></li>
+     <li><a href=#safe-passing-of-structured-data><span class=secno>2.7.5 </span>Safe passing of structured data</a></li>
+     <li><a href=#domstringmap-0><span class=secno>2.7.6 </span>DOMStringMap</a></li>
+     <li><a href=#dom-feature-strings><span class=secno>2.7.7 </span>DOM feature strings</a></li>
+     <li><a href=#exceptions><span class=secno>2.7.8 </span>Exceptions</a></li>
+     <li><a href=#garbage-collection><span class=secno>2.7.9 </span>Garbage collection</a></ol></ol></li>
  <li><a href=#dom><span class=secno>3 </span>Semantics, structure, and APIs of HTML documents</a>
   <ol>
    <li><a href=#documents><span class=secno>3.1 </span>Documents</a>
@@ -851,8 +850,9 @@
      <li><a href=#the-input-stream><span class=secno>9.2.2 </span>The input stream</a>
       <ol>
        <li><a href=#determining-the-character-encoding><span class=secno>9.2.2.1 </span>Determining the character encoding</a></li>
-       <li><a href=#preprocessing-the-input-stream><span class=secno>9.2.2.2 </span>Preprocessing the input stream</a></li>
-       <li><a href=#changing-the-encoding-while-parsing><span class=secno>9.2.2.3 </span>Changing the encoding while parsing</a></ol></li>
+       <li><a href=#character-encodings-0><span class=secno>9.2.2.2 </span>Character encodings</a></li>
+       <li><a href=#preprocessing-the-input-stream><span class=secno>9.2.2.3 </span>Preprocessing the input stream</a></li>
+       <li><a href=#changing-the-encoding-while-parsing><span class=secno>9.2.2.4 </span>Changing the encoding while parsing</a></ol></li>
      <li><a href=#parse-state><span class=secno>9.2.3 </span>Parse state</a>
       <ol>
        <li><a href=#the-insertion-mode><span class=secno>9.2.3.1 </span>The insertion mode</a></li>
@@ -5147,117 +5147,11 @@
   </div>
 
 
-  <div class=impl>
 
-  <h3 id=character-encodings-0><span class=secno>2.7 </span>Character encodings</h3>
+  <h3 id=common-dom-interfaces><span class=secno>2.7 </span>Common DOM interfaces</h3>
 
-  <p>User agents must at a minimum support the UTF-8 and Windows-1252
-  encodings, but may support more.</p>
+  <h4 id=reflecting-content-attributes-in-idl-attributes><span class=secno>2.7.1 </span>Reflecting content attributes in IDL attributes</h4>
 
-  <p class=note>It is not unusual for Web browsers to support dozens
-  if not upwards of a hundred distinct character encodings.</p>
-
-  <p>User agents must support the preferred MIME name of every
-  character encoding they support that has a preferred MIME name, and
-  should support all the IANA-registered aliases of every character
-  encoding they support. <a href=#refsIANACHARSET>[IANACHARSET]</a></p>
-
-  <p>When comparing a string specifying a character encoding with the
-  name or alias of a character encoding to determine if they are
-  equal, user agents must remove any leading or trailing <a href=#space-character title="space character">space characters</a> in both names, and
-  then perform the comparison in an <a href=#ascii-case-insensitive>ASCII
-  case-insensitive</a> manner.</p>
-
-<!-- this bit will be replaced by actual alias registrations in due course -->
-
-  <p>In addition, user agents must support the aliases given in the
-  following table for every character encoding they support, so that
-  labels from the first column are treated as equivalent to the labels
-  given in the corresponding cell from the second column on the same
-  row.</p>
-
-  <table><caption>Additional character encoding aliases</caption>
-   <thead><tr><th> Alias <th> Corresponding encoding <th> References
-   <tbody><tr><td> x-sjis <td> windows-31J <td>
-         <a href=#refsSHIFTJIS>[SHIFTJIS]</a>
-         <a href=#refsWIN31J>[WIN31J]</a>
-    <tr><td> windows-932 <td> windows-31J <td>
-         <a href=#refsWIN31J>[WIN31J]</a>
-    <tr><td> x-x-big5 <td> Big5 <td>
-         <a href=#refsBIG5>[BIG5]</a>
-   </table><!-- end of bit that will be replaced by actual alias registrations in due course --><hr><p>When a user agent would otherwise use an encoding given in the
-  first column of the following table to either convert content to
-  Unicode characters or convert Unicode characters to bytes, it must
-  instead use the encoding given in the cell in the second column of
-  the same row. When a byte or sequence of bytes is treated
-  differently due to this encoding aliasing, it is said to have been
-  <dfn id=misinterpreted-for-compatibility>misinterpreted for compatibility</dfn>.</p>
-
-  <table><caption>Character encoding overrides</caption>
-   <thead><tr><th> Input encoding <th> Replacement encoding <th> References
-   <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> windows-949 <td>
-         <a href=#refsEUCKR>[EUCKR]</a>
-         <a href=#refsWIN949>[WIN949]</a>
-    <tr><td> GB2312 <td> GBK <td>
-         <a href=#refsRFC1345>[RFC1345]</a>
-         <a href=#refsGBK>[GBK]</a>
-    <tr><td> GB_2312-80 <td> GBK <td>
-         <a href=#refsRFC1345>[RFC1345]</a>
-         <a href=#refsGBK>[GBK]</a>
-    <tr><td> ISO-8859-1 <td> windows-1252 <td>
-         <a href=#refsRFC1345>[RFC1345]</a>
-         <a href=#refsWIN1252>[WIN1252]</a>
-    <tr><td> ISO-8859-9 <td> windows-1254 <td>
-         <a href=#refsRFC1345>[RFC1345]</a>
-         <a href=#refsWIN1254>[WIN1254]</a>
-    <tr><td> ISO-8859-11 <td> windows-874 <td>
-         <a href=#refsISO885911>[ISO885911]</a>
-         <a href=#refsWIN874>[WIN874]</a>
-    <tr><td> KS_C_5601-1987 <td> windows-949 <td>
-         <a href=#refsRFC1345>[RFC1345]</a>
-         <a href=#refsWIN949>[WIN949]</a>
-    <tr><td> Shift_JIS <td> windows-31J <td>
-         <a href=#refsSHIFTJIS>[SHIFTJIS]</a>
-         <a href=#refsWIN31J>[WIN31J]</a>
-    <tr><td> TIS-620 <td> windows-874 <td>
-         <a href=#refsTIS620>[TIS620]</a>
-         <a href=#refsWIN874>[WIN874]</a>
-    <tr><td> US-ASCII <td> windows-1252 <td>
-         <a href=#refsRFC1345>[RFC1345]</a>
-         <a href=#refsWIN1252>[WIN1252]</a>
-   </table><p class=note>The requirement to treat certain encodings as other
-  encodings according to the table above is a <a href=#willful-violation>willful
-  violation</a> of the W3C Character Model specification, motivated
-  by a desire for compatibility with legacy content. <a href=#refsCHARMOD>[CHARMOD]</a></p>
-
-  <p>When a user agent is to use the UTF-16 encoding but no BOM has
-  been found, user agents must default to UTF-16LE.</p>
-
-  <p class=note>The requirement to default UTF-16 to LE rather than
-  BE is a <a href=#willful-violation>willful violation</a> of RFC 2781, motivated by a
-  desire for compatibility with legacy content. <a href=#refsCHARMOD>[CHARMOD]</a></p>
-
-  <hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
-  encodings. <a href=#refsCESU8>[CESU8]</a> <a href=#refsUTF7>[UTF7]</a> <a href=#refsBOCU1>[BOCU1]</a> <a href=#refsSCSU>[SCSU]</a></p>
-
-  <p>Support for encodings based on EBCDIC is not recommended. This
-  encoding is rarely used for publicly-facing Web content.</p>
-
-  <p>Support for UTF-32 is not recommended. This encoding is rarely
-  used, and frequently implemented incorrectly.</p>
-
-  <p class=note>This specification does not make any attempt to
-  support EBCDIC-based encodings and UTF-32 in its algorithms; support
-  and use of these encodings can thus lead to unexpected behavior in
-  implementations of this specification.</p>
-
-  </div>
-
-
-  <h3 id=common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</h3>
-
-  <h4 id=reflecting-content-attributes-in-idl-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in IDL attributes</h4>
-
   <p>Some <span title="IDL attribute">IDL attributes</span> are
   defined to <dfn id=reflect>reflect</dfn> a particular <span>content
   attribute</span>. This means that on getting, the IDL attribute
@@ -5436,7 +5330,7 @@
   </div>
 
 
-  <h4 id=collections-0><span class=secno>2.8.2 </span>Collections</h4>
+  <h4 id=collections-0><span class=secno>2.7.2 </span>Collections</h4>
 
   <p>The <code><a href=#htmlcollection>HTMLCollection</a></code>, <code><a href=#htmlallcollection>HTMLAllCollection</a></code>,
   <code><a href=#htmlformcontrolscollection>HTMLFormControlsCollection</a></code>,
@@ -5472,7 +5366,7 @@
   </div>
 
 
-  <h5 id=htmlcollection-0><span class=secno>2.8.2.1 </span>HTMLCollection</h5>
+  <h5 id=htmlcollection-0><span class=secno>2.7.2.1 </span>HTMLCollection</h5>
 
   <p>The <code><a href=#htmlcollection>HTMLCollection</a></code> interface represents a generic
   <a href=#collections title=collections>collection</a> of elements.</p>
@@ -5567,7 +5461,7 @@
   </div>
 
 
-  <h5 id=htmlallcollection-0><span class=secno>2.8.2.2 </span>HTMLAllCollection</h5>
+  <h5 id=htmlallcollection-0><span class=secno>2.7.2.2 </span>HTMLAllCollection</h5>
 
   <p>The <code><a href=#htmlallcollection>HTMLAllCollection</a></code> interface represents a generic
   <a href=#collections title=collections>collection</a> of elements just like
@@ -5684,7 +5578,7 @@
   </div>
 
 
-  <h5 id=htmlformcontrolscollection-0><span class=secno>2.8.2.3 </span>HTMLFormControlsCollection</h5>
+  <h5 id=htmlformcontrolscollection-0><span class=secno>2.7.2.3 </span>HTMLFormControlsCollection</h5>
 
   <p>The <code><a href=#htmlformcontrolscollection>HTMLFormControlsCollection</a></code> interface represents
   a <a href=#collections title=collections>collection</a> of <a href=#category-listed title=category-listed>listed</a> elements in <code><a href=#the-form-element>form</a></code>
@@ -5817,7 +5711,7 @@
 --></div>
 
 
-  <h5 id=htmloptionscollection-0><span class=secno>2.8.2.4 </span>HTMLOptionsCollection</h5>
+  <h5 id=htmloptionscollection-0><span class=secno>2.7.2.4 </span>HTMLOptionsCollection</h5>
 
   <p>The <code><a href=#htmloptionscollection>HTMLOptionsCollection</a></code> interface represents a
   list of <code><a href=#the-option-element>option</a></code> elements. It is always rooted on a
@@ -5980,7 +5874,7 @@
   </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --></div>
 
 
-  <h5 id=htmlpropertycollection-0><span class=secno>2.8.2.5 </span>HTMLPropertyCollection</h5>
+  <h5 id=htmlpropertycollection-0><span class=secno>2.7.2.5 </span>HTMLPropertyCollection</h5>
 
   <p>The <code><a href=#htmlpropertycollection>HTMLPropertyCollection</a></code> interface represents a
   <a href=#collections title=collections>collection</a> of elements that add
@@ -6081,7 +5975,7 @@
   </div>
 
 
-  <h4 id=domtokenlist-0><span class=secno>2.8.3 </span>DOMTokenList</h4>
+  <h4 id=domtokenlist-0><span class=secno>2.7.3 </span>DOMTokenList</h4>
 
   <p>The <code><a href=#domtokenlist>DOMTokenList</a></code> interface represents an interface
   to an underlying string that consists of a <a href=#set-of-space-separated-tokens>set of
@@ -6264,7 +6158,7 @@
   </div>
 
 
-  <h4 id=domsettabletokenlist-0><span class=secno>2.8.4 </span>DOMSettableTokenList</h4>
+  <h4 id=domsettabletokenlist-0><span class=secno>2.7.4 </span>DOMSettableTokenList</h4>
 
   <p>The <code><a href=#domsettabletokenlist>DOMSettableTokenList</a></code> interface is the same as the
   <code><a href=#domtokenlist>DOMTokenList</a></code> interface, except that it allows the
@@ -6296,7 +6190,7 @@
 
   <div class=impl>
 
-  <h4 id=safe-passing-of-structured-data><span class=secno>2.8.5 </span>Safe passing of structured data</h4>
+  <h4 id=safe-passing-of-structured-data><span class=secno>2.7.5 </span>Safe passing of structured data</h4>
 
   <p>When a user agent is required to obtain a <dfn id=structured-clone>structured
   clone</dfn> of an object, it must run the following algorithm, which
@@ -6423,7 +6317,7 @@
   </dl></div>
 
 
-  <h4 id=domstringmap-0><span class=secno>2.8.6 </span>DOMStringMap</h4>
+  <h4 id=domstringmap-0><span class=secno>2.7.6 </span>DOMStringMap</h4>
 
   <p>The <code><a href=#domstringmap>DOMStringMap</a></code> interface represents a set of
   name-value pairs. It exposes these using the scripting language's
@@ -6506,7 +6400,7 @@
   </div>
 
 
-  <h4 id=dom-feature-strings><span class=secno>2.8.7 </span>DOM feature strings</h4>
+  <h4 id=dom-feature-strings><span class=secno>2.7.7 </span>DOM feature strings</h4>
 
   <p>DOM3 Core defines mechanisms for checking for interface support,
   and for obtaining implementations of interfaces, using <a href=http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMFeatures>feature
@@ -6528,7 +6422,7 @@
   </div>
 
 
-  <h4 id=exceptions><span class=secno>2.8.8 </span>Exceptions</h4>
+  <h4 id=exceptions><span class=secno>2.7.8 </span>Exceptions</h4>
 
   <p>The following <code>DOMException</code> codes are defined in DOM
   Core. <a href=#refsDOMCORE>[DOMCORE]</a></p>
@@ -6560,7 +6454,7 @@
    <li value=82><dfn id=serialize_err><code>SERIALIZE_ERR</code></dfn></li> <!-- actually defined in dom3ls -->
   </ol><div class=impl>
 
-  <h4 id=garbage-collection><span class=secno>2.8.9 </span>Garbage collection</h4>
+  <h4 id=garbage-collection><span class=secno>2.7.9 </span>Garbage collection</h4>
 
   <p>There is an <dfn id=implied-strong-reference>implied strong reference</dfn> from any IDL
   attribute that returns a pre-existing object to that object.</p>
@@ -60798,9 +60692,112 @@
   use for the input stream.</p>
 
 
+  <h5 id=character-encodings-0><span class=secno>9.2.2.2 </span>Character encodings</h5>
 
-  <h5 id=preprocessing-the-input-stream><span class=secno>9.2.2.2 </span>Preprocessing the input stream</h5>
+  <p>User agents must at a minimum support the UTF-8 and Windows-1252
+  encodings, but may support more.</p>
 
+  <p class=note>It is not unusual for Web browsers to support dozens
+  if not upwards of a hundred distinct character encodings.</p>
+
+  <p>User agents must support the preferred MIME name of every
+  character encoding they support that has a preferred MIME name, and
+  should support all the IANA-registered aliases of every character
+  encoding they support. <a href=#refsIANACHARSET>[IANACHARSET]</a></p>
+
+  <p>When comparing a string specifying a character encoding with the
+  name or alias of a character encoding to determine if they are
+  equal, user agents must remove any leading or trailing <a href=#space-character title="space character">space characters</a> in both names, and
+  then perform the comparison in an <a href=#ascii-case-insensitive>ASCII
+  case-insensitive</a> manner.</p>
+
+<!-- this bit will be replaced by actual alias registrations in due course -->
+
+  <p>In addition, user agents must support the aliases given in the
+  following table for every character encoding they support, so that
+  labels from the first column are treated as equivalent to the labels
+  given in the corresponding cell from the second column on the same
+  row.</p>
+
+  <table><caption>Additional character encoding aliases</caption>
+   <thead><tr><th> Alias <th> Corresponding encoding <th> References
+   <tbody><tr><td> x-sjis <td> windows-31J <td>
+         <a href=#refsSHIFTJIS>[SHIFTJIS]</a>
+         <a href=#refsWIN31J>[WIN31J]</a>
+    <tr><td> windows-932 <td> windows-31J <td>
+         <a href=#refsWIN31J>[WIN31J]</a>
+    <tr><td> x-x-big5 <td> Big5 <td>
+         <a href=#refsBIG5>[BIG5]</a>
+   </table><!-- end of bit that will be replaced by actual alias registrations in due course --><hr><p>When a user agent would otherwise use an encoding given in the
+  first column of the following table to either convert content to
+  Unicode characters or convert Unicode characters to bytes, it must
+  instead use the encoding given in the cell in the second column of
+  the same row. When a byte or sequence of bytes is treated
+  differently due to this encoding aliasing, it is said to have been
+  <dfn id=misinterpreted-for-compatibility>misinterpreted for compatibility</dfn>.</p>
+
+  <table><caption>Character encoding overrides</caption>
+   <thead><tr><th> Input encoding <th> Replacement encoding <th> References
+   <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> windows-949 <td>
+         <a href=#refsEUCKR>[EUCKR]</a>
+         <a href=#refsWIN949>[WIN949]</a>
+    <tr><td> GB2312 <td> GBK <td>
+         <a href=#refsRFC1345>[RFC1345]</a>
+         <a href=#refsGBK>[GBK]</a>
+    <tr><td> GB_2312-80 <td> GBK <td>
+         <a href=#refsRFC1345>[RFC1345]</a>
+         <a href=#refsGBK>[GBK]</a>
+    <tr><td> ISO-8859-1 <td> windows-1252 <td>
+         <a href=#refsRFC1345>[RFC1345]</a>
+         <a href=#refsWIN1252>[WIN1252]</a>
+    <tr><td> ISO-8859-9 <td> windows-1254 <td>
+         <a href=#refsRFC1345>[RFC1345]</a>
+         <a href=#refsWIN1254>[WIN1254]</a>
+    <tr><td> ISO-8859-11 <td> windows-874 <td>
+         <a href=#refsISO885911>[ISO885911]</a>
+         <a href=#refsWIN874>[WIN874]</a>
+    <tr><td> KS_C_5601-1987 <td> windows-949 <td>
+         <a href=#refsRFC1345>[RFC1345]</a>
+         <a href=#refsWIN949>[WIN949]</a>
+    <tr><td> Shift_JIS <td> windows-31J <td>
+         <a href=#refsSHIFTJIS>[SHIFTJIS]</a>
+         <a href=#refsWIN31J>[WIN31J]</a>
+    <tr><td> TIS-620 <td> windows-874 <td>
+         <a href=#refsTIS620>[TIS620]</a>
+         <a href=#refsWIN874>[WIN874]</a>
+    <tr><td> US-ASCII <td> windows-1252 <td>
+         <a href=#refsRFC1345>[RFC1345]</a>
+         <a href=#refsWIN1252>[WIN1252]</a>
+   </table><p class=note>The requirement to treat certain encodings as other
+  encodings according to the table above is a <a href=#willful-violation>willful
+  violation</a> of the W3C Character Model specification, motivated
+  by a desire for compatibility with legacy content. <a href=#refsCHARMOD>[CHARMOD]</a></p>
+
+  <p>When a user agent is to use the UTF-16 encoding but no BOM has
+  been found, user agents must default to UTF-16LE.</p>
+
+  <p class=note>The requirement to default UTF-16 to LE rather than
+  BE is a <a href=#willful-violation>willful violation</a> of RFC 2781, motivated by a
+  desire for compatibility with legacy content. <a href=#refsCHARMOD>[CHARMOD]</a></p>
+
+  <hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
+  encodings. <a href=#refsCESU8>[CESU8]</a> <a href=#refsUTF7>[UTF7]</a> <a href=#refsBOCU1>[BOCU1]</a> <a href=#refsSCSU>[SCSU]</a></p>
+
+  <p>Support for encodings based on EBCDIC is not recommended. This
+  encoding is rarely used for publicly-facing Web content.</p>
+
+  <p>Support for UTF-32 is not recommended. This encoding is rarely
+  used, and frequently implemented incorrectly.</p>
+
+  <p class=note>This specification does not make any attempt to
+  support EBCDIC-based encodings and UTF-32 in its algorithms; support
+  and use of these encodings can thus lead to unexpected behavior in
+  implementations of this specification.</p>
+
+
+
+  <h5 id=preprocessing-the-input-stream><span class=secno>9.2.2.3 </span>Preprocessing the input stream</h5>
+
   <p>Given an encoding, the bytes in the input stream must be
   converted to Unicode characters for the tokenizer, as described by
   the rules for that encoding, except that the leading U+FEFF BYTE
@@ -60874,7 +60871,7 @@
   the stream, but rather the lack of any further characters.</p>
 
 
-  <h5 id=changing-the-encoding-while-parsing><span class=secno>9.2.2.3 </span>Changing the encoding while parsing</h5>
+  <h5 id=changing-the-encoding-while-parsing><span class=secno>9.2.2.4 </span>Changing the encoding while parsing</h5>
 
   <p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
   encoding</dfn>, it must run the following steps. This might happen

Modified: source
===================================================================
--- source	2009-09-09 05:31:55 UTC (rev 3771)
+++ source	2009-09-09 06:47:17 UTC (rev 3772)
@@ -4857,138 +4857,7 @@
   </div>
 
 
-  <div class="impl">
 
-  <h3>Character encodings</h3>
-
-  <p>User agents must at a minimum support the UTF-8 and Windows-1252
-  encodings, but may support more.</p>
-
-  <p class="note">It is not unusual for Web browsers to support dozens
-  if not upwards of a hundred distinct character encodings.</p>
-
-  <p>User agents must support the preferred MIME name of every
-  character encoding they support that has a preferred MIME name, and
-  should support all the IANA-registered aliases of every character
-  encoding they support. <a
-  href="#refsIANACHARSET">[IANACHARSET]</a></p>
-
-  <p>When comparing a string specifying a character encoding with the
-  name or alias of a character encoding to determine if they are
-  equal, user agents must remove any leading or trailing <span
-  title="space character">space characters</span> in both names, and
-  then perform the comparison in an <span>ASCII
-  case-insensitive</span> manner.</p>
-
-<!-- this bit will be replaced by actual alias registrations in due course -->
-
-  <p>In addition, user agents must support the aliases given in the
-  following table for every character encoding they support, so that
-  labels from the first column are treated as equivalent to the labels
-  given in the corresponding cell from the second column on the same
-  row.</p>
-
-  <table>
-   <caption>Additional character encoding aliases</caption>
-   <thead>
-    <tr> <th> Alias <th> Corresponding encoding <th> References
-   <tbody>
-    <tr> <td> x-sjis <td> windows-31J <td>
-         <a href="#refsSHIFTJIS">[SHIFTJIS]</a>
-         <a href="#refsWIN31J">[WIN31J]</a>
-    <tr> <td> windows-932 <td> windows-31J <td>
-         <a href="#refsWIN31J">[WIN31J]</a>
-    <tr> <td> x-x-big5 <td> Big5 <td>
-         <a href="#refsBIG5">[BIG5]</a>
-   </tbody>
-  </table>
-
-<!-- end of bit that will be replaced by actual alias registrations in due course -->
-
-  <hr>
-
-  <p>When a user agent would otherwise use an encoding given in the
-  first column of the following table to either convert content to
-  Unicode characters or convert Unicode characters to bytes, it must
-  instead use the encoding given in the cell in the second column of
-  the same row. When a byte or sequence of bytes is treated
-  differently due to this encoding aliasing, it is said to have been
-  <dfn>misinterpreted for compatibility</dfn>.</p>
-
-  <table>
-   <caption>Character encoding overrides</caption>
-   <thead>
-    <tr> <th> Input encoding <th> Replacement encoding <th> References
-   <tbody>
-    <!-- how about EUC-JP? -->
-    <tr> <td> EUC-KR <td> windows-949 <td>
-         <a href="#refsEUCKR">[EUCKR]</a>
-         <a href="#refsWIN949">[WIN949]</a>
-    <tr> <td> GB2312 <td> GBK <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsGBK">[GBK]</a>
-    <tr> <td> GB_2312-80 <td> GBK <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsGBK">[GBK]</a>
-    <tr> <td> ISO-8859-1 <td> windows-1252 <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsWIN1252">[WIN1252]</a>
-    <tr> <td> ISO-8859-9 <td> windows-1254 <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsWIN1254">[WIN1254]</a>
-    <tr> <td> ISO-8859-11 <td> windows-874 <td>
-         <a href="#refsISO885911">[ISO885911]</a>
-         <a href="#refsWIN874">[WIN874]</a>
-    <tr> <td> KS_C_5601-1987 <td> windows-949 <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsWIN949">[WIN949]</a>
-    <tr> <td> Shift_JIS <td> windows-31J <td>
-         <a href="#refsSHIFTJIS">[SHIFTJIS]</a>
-         <a href="#refsWIN31J">[WIN31J]</a>
-    <tr> <td> TIS-620 <td> windows-874 <td>
-         <a href="#refsTIS620">[TIS620]</a>
-         <a href="#refsWIN874">[WIN874]</a>
-    <tr> <td> US-ASCII <td> windows-1252 <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsWIN1252">[WIN1252]</a>
-   </tbody>
-  </table>
-
-  <p class="note">The requirement to treat certain encodings as other
-  encodings according to the table above is a <span>willful
-  violation</span> of the W3C Character Model specification, motivated
-  by a desire for compatibility with legacy content. <a
-  href="#refsCHARMOD">[CHARMOD]</a></p>
-
-  <p>When a user agent is to use the UTF-16 encoding but no BOM has
-  been found, user agents must default to UTF-16LE.</p>
-
-  <p class="note">The requirement to default UTF-16 to LE rather than
-  BE is a <span>willful violation</span> of RFC 2781, motivated by a
-  desire for compatibility with legacy content. <a
-  href="#refsCHARMOD">[CHARMOD]</a></p>
-
-  <hr>
-
-  <p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
-  encodings. <a href="#refsCESU8">[CESU8]</a> <a
-  href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a
-  href="#refsSCSU">[SCSU]</a></p>
-
-  <p>Support for encodings based on EBCDIC is not recommended. This
-  encoding is rarely used for publicly-facing Web content.</p>
-
-  <p>Support for UTF-32 is not recommended. This encoding is rarely
-  used, and frequently implemented incorrectly.</p>
-
-  <p class="note">This specification does not make any attempt to
-  support EBCDIC-based encodings and UTF-32 in its algorithms; support
-  and use of these encodings can thus lead to unexpected behavior in
-  implementations of this specification.</p>
-
-  </div>
-
-
   <h3>Common DOM interfaces</h3>
 
   <h4>Reflecting content attributes in IDL attributes</h4>
@@ -75338,7 +75207,135 @@
   use for the input stream.</p>
 
 
+  <h5>Character encodings</h5>
 
+  <p>User agents must at a minimum support the UTF-8 and Windows-1252
+  encodings, but may support more.</p>
+
+  <p class="note">It is not unusual for Web browsers to support dozens
+  if not upwards of a hundred distinct character encodings.</p>
+
+  <p>User agents must support the preferred MIME name of every
+  character encoding they support that has a preferred MIME name, and
+  should support all the IANA-registered aliases of every character
+  encoding they support. <a
+  href="#refsIANACHARSET">[IANACHARSET]</a></p>
+
+  <p>When comparing a string specifying a character encoding with the
+  name or alias of a character encoding to determine if they are
+  equal, user agents must remove any leading or trailing <span
+  title="space character">space characters</span> in both names, and
+  then perform the comparison in an <span>ASCII
+  case-insensitive</span> manner.</p>
+
+<!-- this bit will be replaced by actual alias registrations in due course -->
+
+  <p>In addition, user agents must support the aliases given in the
+  following table for every character encoding they support, so that
+  labels from the first column are treated as equivalent to the labels
+  given in the corresponding cell from the second column on the same
+  row.</p>
+
+  <table>
+   <caption>Additional character encoding aliases</caption>
+   <thead>
+    <tr> <th> Alias <th> Corresponding encoding <th> References
+   <tbody>
+    <tr> <td> x-sjis <td> windows-31J <td>
+         <a href="#refsSHIFTJIS">[SHIFTJIS]</a>
+         <a href="#refsWIN31J">[WIN31J]</a>
+    <tr> <td> windows-932 <td> windows-31J <td>
+         <a href="#refsWIN31J">[WIN31J]</a>
+    <tr> <td> x-x-big5 <td> Big5 <td>
+         <a href="#refsBIG5">[BIG5]</a>
+   </tbody>
+  </table>
+
+<!-- end of bit that will be replaced by actual alias registrations in due course -->
+
+  <hr>
+
+  <p>When a user agent would otherwise use an encoding given in the
+  first column of the following table to either convert content to
+  Unicode characters or convert Unicode characters to bytes, it must
+  instead use the encoding given in the cell in the second column of
+  the same row. When a byte or sequence of bytes is treated
+  differently due to this encoding aliasing, it is said to have been
+  <dfn>misinterpreted for compatibility</dfn>.</p>
+
+  <table>
+   <caption>Character encoding overrides</caption>
+   <thead>
+    <tr> <th> Input encoding <th> Replacement encoding <th> References
+   <tbody>
+    <!-- how about EUC-JP? -->
+    <tr> <td> EUC-KR <td> windows-949 <td>
+         <a href="#refsEUCKR">[EUCKR]</a>
+         <a href="#refsWIN949">[WIN949]</a>
+    <tr> <td> GB2312 <td> GBK <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsGBK">[GBK]</a>
+    <tr> <td> GB_2312-80 <td> GBK <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsGBK">[GBK]</a>
+    <tr> <td> ISO-8859-1 <td> windows-1252 <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsWIN1252">[WIN1252]</a>
+    <tr> <td> ISO-8859-9 <td> windows-1254 <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsWIN1254">[WIN1254]</a>
+    <tr> <td> ISO-8859-11 <td> windows-874 <td>
+         <a href="#refsISO885911">[ISO885911]</a>
+         <a href="#refsWIN874">[WIN874]</a>
+    <tr> <td> KS_C_5601-1987 <td> windows-949 <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsWIN949">[WIN949]</a>
+    <tr> <td> Shift_JIS <td> windows-31J <td>
+         <a href="#refsSHIFTJIS">[SHIFTJIS]</a>
+         <a href="#refsWIN31J">[WIN31J]</a>
+    <tr> <td> TIS-620 <td> windows-874 <td>
+         <a href="#refsTIS620">[TIS620]</a>
+         <a href="#refsWIN874">[WIN874]</a>
+    <tr> <td> US-ASCII <td> windows-1252 <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsWIN1252">[WIN1252]</a>
+   </tbody>
+  </table>
+
+  <p class="note">The requirement to treat certain encodings as other
+  encodings according to the table above is a <span>willful
+  violation</span> of the W3C Character Model specification, motivated
+  by a desire for compatibility with legacy content. <a
+  href="#refsCHARMOD">[CHARMOD]</a></p>
+
+  <p>When a user agent is to use the UTF-16 encoding but no BOM has
+  been found, user agents must default to UTF-16LE.</p>
+
+  <p class="note">The requirement to default UTF-16 to LE rather than
+  BE is a <span>willful violation</span> of RFC 2781, motivated by a
+  desire for compatibility with legacy content. <a
+  href="#refsCHARMOD">[CHARMOD]</a></p>
+
+  <hr>
+
+  <p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
+  encodings. <a href="#refsCESU8">[CESU8]</a> <a
+  href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a
+  href="#refsSCSU">[SCSU]</a></p>
+
+  <p>Support for encodings based on EBCDIC is not recommended. This
+  encoding is rarely used for publicly-facing Web content.</p>
+
+  <p>Support for UTF-32 is not recommended. This encoding is rarely
+  used, and frequently implemented incorrectly.</p>
+
+  <p class="note">This specification does not make any attempt to
+  support EBCDIC-based encodings and UTF-32 in its algorithms; support
+  and use of these encodings can thus lead to unexpected behavior in
+  implementations of this specification.</p>
+
+
+
   <h5>Preprocessing the input stream</h5>
 
   <p>Given an encoding, the bytes in the input stream must be




More information about the Commit-Watchers mailing list