[html5] r7360 - [giow] (3) Make a BOM override HTTP headers. Fixing https://www.w3.org/Bugs/Publ [...]
whatwg at whatwg.org
whatwg at whatwg.org
Sat Sep 15 20:55:56 PDT 2012
Author: ianh
Date: 2012-09-15 20:55:55 -0700 (Sat, 15 Sep 2012)
New Revision: 7360
Modified:
complete.html
index
source
Log:
[giow] (3) Make a BOM override HTTP headers.
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=17810
Affected topics: HTML Syntax and Parsing
Modified: complete.html
===================================================================
--- complete.html 2012-09-16 03:27:25 UTC (rev 7359)
+++ complete.html 2012-09-16 03:55:55 UTC (rev 7360)
@@ -88430,10 +88430,6 @@
</li>
- <li><p>If the transport layer specifies an encoding, and it is
- supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
- <i>certain</i>, and abort these steps.</li>
-
<li>
<p>The user agent may wait for more bytes of the resource to be
@@ -88455,14 +88451,22 @@
</li>
- <li><p>For each of the rows in the following table, starting with
- the first one and going down, if there are as many or more bytes
- available than the number of bytes in the first column, and the
- first bytes of the file match the bytes given in the first column,
- then return the encoding given in the cell in the second column of
- that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
- <i>certain</i>, and abort these steps:</p>
+ <li>
+ <!-- Doing this step before honouring HTTP is important for supporting
+ http://kb.dsqq.cn/html/2012-09/16/node_193.htm
+ which is encoded as UTF-8 but is incorrectly labeled as
+ Content-Type: text/html; charset=GB2312
+ -->
+
+ <p>For each of the rows in the following table, starting with the
+ first one and going down, if there are as many or more bytes
+ available than the number of bytes in the first column, and the
+ first bytes of the file match the bytes given in the first column,
+ then return the encoding given in the cell in the second column of
+ that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+ <i>certain</i>, and abort these steps:</p>
+
<!-- this table is present in several forms in this file; keep them in sync -->
<table><thead><tr><th>Bytes in Hexadecimal
<th>Encoding
@@ -88485,12 +88489,24 @@
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
- (BOMs).</li>
+ (BOMs).</p>
+ <p class=note>That this step happens before the next one
+ honoring the HTTP <code><a href=#content-type>Content-Type</a></code> header is a
+ <a href=#willful-violation>willful violation</a> of the HTTP specification,
+ motivated by a desire to be maximally compatible with legacy
+ content. <a href=#refsHTTP>[HTTP]</a></p>
+
+ </li>
+
+ <li><p>If the transport layer specifies an encoding, and it is
+ supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+ <i>certain</i>, and abort these steps.</li>
+
<li>
- <p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
- determine its encoding">prescan the byte stream to determine its
+ <p>Optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to determine its
+ encoding">prescan the byte stream to determine its
encoding</a>. The <var title="">end condition</var> is that the
user agent decides that scanning further bytes would not be
efficient. User agents are encouraged to only prescan the first
Modified: index
===================================================================
--- index 2012-09-16 03:27:25 UTC (rev 7359)
+++ index 2012-09-16 03:55:55 UTC (rev 7360)
@@ -88430,10 +88430,6 @@
</li>
- <li><p>If the transport layer specifies an encoding, and it is
- supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
- <i>certain</i>, and abort these steps.</li>
-
<li>
<p>The user agent may wait for more bytes of the resource to be
@@ -88455,14 +88451,22 @@
</li>
- <li><p>For each of the rows in the following table, starting with
- the first one and going down, if there are as many or more bytes
- available than the number of bytes in the first column, and the
- first bytes of the file match the bytes given in the first column,
- then return the encoding given in the cell in the second column of
- that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
- <i>certain</i>, and abort these steps:</p>
+ <li>
+ <!-- Doing this step before honouring HTTP is important for supporting
+ http://kb.dsqq.cn/html/2012-09/16/node_193.htm
+ which is encoded as UTF-8 but is incorrectly labeled as
+ Content-Type: text/html; charset=GB2312
+ -->
+
+ <p>For each of the rows in the following table, starting with the
+ first one and going down, if there are as many or more bytes
+ available than the number of bytes in the first column, and the
+ first bytes of the file match the bytes given in the first column,
+ then return the encoding given in the cell in the second column of
+ that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+ <i>certain</i>, and abort these steps:</p>
+
<!-- this table is present in several forms in this file; keep them in sync -->
<table><thead><tr><th>Bytes in Hexadecimal
<th>Encoding
@@ -88485,12 +88489,24 @@
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
- (BOMs).</li>
+ (BOMs).</p>
+ <p class=note>That this step happens before the next one
+ honoring the HTTP <code><a href=#content-type>Content-Type</a></code> header is a
+ <a href=#willful-violation>willful violation</a> of the HTTP specification,
+ motivated by a desire to be maximally compatible with legacy
+ content. <a href=#refsHTTP>[HTTP]</a></p>
+
+ </li>
+
+ <li><p>If the transport layer specifies an encoding, and it is
+ supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+ <i>certain</i>, and abort these steps.</li>
+
<li>
- <p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
- determine its encoding">prescan the byte stream to determine its
+ <p>Optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to determine its
+ encoding">prescan the byte stream to determine its
encoding</a>. The <var title="">end condition</var> is that the
user agent decides that scanning further bytes would not be
efficient. User agents are encouraged to only prescan the first
Modified: source
===================================================================
--- source 2012-09-16 03:27:25 UTC (rev 7359)
+++ source 2012-09-16 03:55:55 UTC (rev 7360)
@@ -102588,11 +102588,6 @@
</li>
- <li><p>If the transport layer specifies an encoding, and it is
- supported, return that encoding with the <span
- title="concept-encoding-confidence">confidence</span>
- <i>certain</i>, and abort these steps.</p></li>
-
<li>
<p>The user agent may wait for more bytes of the resource to be
@@ -102615,15 +102610,23 @@
</li>
- <li><p>For each of the rows in the following table, starting with
- the first one and going down, if there are as many or more bytes
- available than the number of bytes in the first column, and the
- first bytes of the file match the bytes given in the first column,
- then return the encoding given in the cell in the second column of
- that row, with the <span
- title="concept-encoding-confidence">confidence</span>
- <i>certain</i>, and abort these steps:</p>
+ <li>
+ <!-- Doing this step before honouring HTTP is important for supporting
+ http://kb.dsqq.cn/html/2012-09/16/node_193.htm
+ which is encoded as UTF-8 but is incorrectly labeled as
+ Content-Type: text/html; charset=GB2312
+ -->
+
+ <p>For each of the rows in the following table, starting with the
+ first one and going down, if there are as many or more bytes
+ available than the number of bytes in the first column, and the
+ first bytes of the file match the bytes given in the first column,
+ then return the encoding given in the cell in the second column of
+ that row, with the <span
+ title="concept-encoding-confidence">confidence</span>
+ <i>certain</i>, and abort these steps:</p>
+
<!-- this table is present in several forms in this file; keep them in sync -->
<table>
<thead>
@@ -102655,13 +102658,26 @@
-->
</table>
- <p class="note">This step looks for Unicode Byte Order Marks
- (BOMs).</p></li>
+ <p class="note">This step looks for Unicode Byte Order Marks
+ (BOMs).</p>
+ <p class="note">That this step happens before the next one
+ honoring the HTTP <code>Content-Type</code> header is a
+ <span>willful violation</span> of the HTTP specification,
+ motivated by a desire to be maximally compatible with legacy
+ content. <a href="#refsHTTP">[HTTP]</a></p>
+
+ </li>
+
+ <li><p>If the transport layer specifies an encoding, and it is
+ supported, return that encoding with the <span
+ title="concept-encoding-confidence">confidence</span>
+ <i>certain</i>, and abort these steps.</p></li>
+
<li>
- <p>Otherwise, optionally <span title="prescan a byte stream to
- determine its encoding">prescan the byte stream to determine its
+ <p>Optionally <span title="prescan a byte stream to determine its
+ encoding">prescan the byte stream to determine its
encoding</span>. The <var title="">end condition</var> is that the
user agent decides that scanning further bytes would not be
efficient. User agents are encouraged to only prescan the first
More information about the Commit-Watchers
mailing list