[html5] r7360 - [giow] (3) Make a BOM override HTTP headers. Fixing https://www.w3.org/Bugs/Publ [...]

whatwg at whatwg.org whatwg at whatwg.org
Sat Sep 15 20:55:56 PDT 2012


Author: ianh
Date: 2012-09-15 20:55:55 -0700 (Sat, 15 Sep 2012)
New Revision: 7360

Modified:
   complete.html
   index
   source
Log:
[giow] (3) Make a BOM override HTTP headers.
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=17810
Affected topics: HTML Syntax and Parsing

Modified: complete.html
===================================================================
--- complete.html	2012-09-16 03:27:25 UTC (rev 7359)
+++ complete.html	2012-09-16 03:55:55 UTC (rev 7360)
@@ -88430,10 +88430,6 @@
 
    </li>
 
-   <li><p>If the transport layer specifies an encoding, and it is
-   supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
-   <i>certain</i>, and abort these steps.</li>
-
    <li>
 
     <p>The user agent may wait for more bytes of the resource to be
@@ -88455,14 +88451,22 @@
 
    </li>
 
-   <li><p>For each of the rows in the following table, starting with
-   the first one and going down, if there are as many or more bytes
-   available than the number of bytes in the first column, and the
-   first bytes of the file match the bytes given in the first column,
-   then return the encoding given in the cell in the second column of
-   that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
-   <i>certain</i>, and abort these steps:</p>
+   <li>
 
+    <!-- Doing this step before honouring HTTP is important for supporting
+            http://kb.dsqq.cn/html/2012-09/16/node_193.htm
+         which is encoded as UTF-8 but is incorrectly labeled as
+            Content-Type: text/html; charset=GB2312
+    -->
+
+    <p>For each of the rows in the following table, starting with the
+    first one and going down, if there are as many or more bytes
+    available than the number of bytes in the first column, and the
+    first bytes of the file match the bytes given in the first column,
+    then return the encoding given in the cell in the second column of
+    that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+    <i>certain</i>, and abort these steps:</p>
+
     <!-- this table is present in several forms in this file; keep them in sync -->
     <table><thead><tr><th>Bytes in Hexadecimal
        <th>Encoding
@@ -88485,12 +88489,24 @@
        <td>UTF-EBCDIC
 -->
     </table><p class=note>This step looks for Unicode Byte Order Marks
-   (BOMs).</li>
+    (BOMs).</p>
 
+    <p class=note>That this step happens before the next one
+    honoring the HTTP <code><a href=#content-type>Content-Type</a></code> header is a
+    <a href=#willful-violation>willful violation</a> of the HTTP specification,
+    motivated by a desire to be maximally compatible with legacy
+    content. <a href=#refsHTTP>[HTTP]</a></p>
+
+   </li>
+
+   <li><p>If the transport layer specifies an encoding, and it is
+   supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+   <i>certain</i>, and abort these steps.</li>
+
    <li>
 
-    <p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
-    determine its encoding">prescan the byte stream to determine its
+    <p>Optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to determine its
+    encoding">prescan the byte stream to determine its
     encoding</a>. The <var title="">end condition</var> is that the
     user agent decides that scanning further bytes would not be
     efficient. User agents are encouraged to only prescan the first

Modified: index
===================================================================
--- index	2012-09-16 03:27:25 UTC (rev 7359)
+++ index	2012-09-16 03:55:55 UTC (rev 7360)
@@ -88430,10 +88430,6 @@
 
    </li>
 
-   <li><p>If the transport layer specifies an encoding, and it is
-   supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
-   <i>certain</i>, and abort these steps.</li>
-
    <li>
 
     <p>The user agent may wait for more bytes of the resource to be
@@ -88455,14 +88451,22 @@
 
    </li>
 
-   <li><p>For each of the rows in the following table, starting with
-   the first one and going down, if there are as many or more bytes
-   available than the number of bytes in the first column, and the
-   first bytes of the file match the bytes given in the first column,
-   then return the encoding given in the cell in the second column of
-   that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
-   <i>certain</i>, and abort these steps:</p>
+   <li>
 
+    <!-- Doing this step before honouring HTTP is important for supporting
+            http://kb.dsqq.cn/html/2012-09/16/node_193.htm
+         which is encoded as UTF-8 but is incorrectly labeled as
+            Content-Type: text/html; charset=GB2312
+    -->
+
+    <p>For each of the rows in the following table, starting with the
+    first one and going down, if there are as many or more bytes
+    available than the number of bytes in the first column, and the
+    first bytes of the file match the bytes given in the first column,
+    then return the encoding given in the cell in the second column of
+    that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+    <i>certain</i>, and abort these steps:</p>
+
     <!-- this table is present in several forms in this file; keep them in sync -->
     <table><thead><tr><th>Bytes in Hexadecimal
        <th>Encoding
@@ -88485,12 +88489,24 @@
        <td>UTF-EBCDIC
 -->
     </table><p class=note>This step looks for Unicode Byte Order Marks
-   (BOMs).</li>
+    (BOMs).</p>
 
+    <p class=note>That this step happens before the next one
+    honoring the HTTP <code><a href=#content-type>Content-Type</a></code> header is a
+    <a href=#willful-violation>willful violation</a> of the HTTP specification,
+    motivated by a desire to be maximally compatible with legacy
+    content. <a href=#refsHTTP>[HTTP]</a></p>
+
+   </li>
+
+   <li><p>If the transport layer specifies an encoding, and it is
+   supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+   <i>certain</i>, and abort these steps.</li>
+
    <li>
 
-    <p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
-    determine its encoding">prescan the byte stream to determine its
+    <p>Optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to determine its
+    encoding">prescan the byte stream to determine its
     encoding</a>. The <var title="">end condition</var> is that the
     user agent decides that scanning further bytes would not be
     efficient. User agents are encouraged to only prescan the first

Modified: source
===================================================================
--- source	2012-09-16 03:27:25 UTC (rev 7359)
+++ source	2012-09-16 03:55:55 UTC (rev 7360)
@@ -102588,11 +102588,6 @@
 
    </li>
 
-   <li><p>If the transport layer specifies an encoding, and it is
-   supported, return that encoding with the <span
-   title="concept-encoding-confidence">confidence</span>
-   <i>certain</i>, and abort these steps.</p></li>
-
    <li>
 
     <p>The user agent may wait for more bytes of the resource to be
@@ -102615,15 +102610,23 @@
 
    </li>
 
-   <li><p>For each of the rows in the following table, starting with
-   the first one and going down, if there are as many or more bytes
-   available than the number of bytes in the first column, and the
-   first bytes of the file match the bytes given in the first column,
-   then return the encoding given in the cell in the second column of
-   that row, with the <span
-   title="concept-encoding-confidence">confidence</span>
-   <i>certain</i>, and abort these steps:</p>
+   <li>
 
+    <!-- Doing this step before honouring HTTP is important for supporting
+            http://kb.dsqq.cn/html/2012-09/16/node_193.htm
+         which is encoded as UTF-8 but is incorrectly labeled as
+            Content-Type: text/html; charset=GB2312
+    -->
+
+    <p>For each of the rows in the following table, starting with the
+    first one and going down, if there are as many or more bytes
+    available than the number of bytes in the first column, and the
+    first bytes of the file match the bytes given in the first column,
+    then return the encoding given in the cell in the second column of
+    that row, with the <span
+    title="concept-encoding-confidence">confidence</span>
+    <i>certain</i>, and abort these steps:</p>
+
     <!-- this table is present in several forms in this file; keep them in sync -->
     <table>
      <thead>
@@ -102655,13 +102658,26 @@
 -->
     </table>
 
-   <p class="note">This step looks for Unicode Byte Order Marks
-   (BOMs).</p></li>
+    <p class="note">This step looks for Unicode Byte Order Marks
+    (BOMs).</p>
 
+    <p class="note">That this step happens before the next one
+    honoring the HTTP <code>Content-Type</code> header is a
+    <span>willful violation</span> of the HTTP specification,
+    motivated by a desire to be maximally compatible with legacy
+    content. <a href="#refsHTTP">[HTTP]</a></p>
+
+   </li>
+
+   <li><p>If the transport layer specifies an encoding, and it is
+   supported, return that encoding with the <span
+   title="concept-encoding-confidence">confidence</span>
+   <i>certain</i>, and abort these steps.</p></li>
+
    <li>
 
-    <p>Otherwise, optionally <span title="prescan a byte stream to
-    determine its encoding">prescan the byte stream to determine its
+    <p>Optionally <span title="prescan a byte stream to determine its
+    encoding">prescan the byte stream to determine its
     encoding</span>. The <var title="">end condition</var> is that the
     user agent decides that scanning further bytes would not be
     efficient. User agents are encouraged to only prescan the first




More information about the Commit-Watchers mailing list