[html5] r8618 - [giow] (2) Make <meta charset=x-user-defined> turn into windows-1252 for legacy [...]
whatwg at whatwg.org
whatwg at whatwg.org
Wed May 7 16:32:27 PDT 2014
Author: ianh
Date: 2014-05-07 16:32:17 -0700 (Wed, 07 May 2014)
New Revision: 8618
Modified:
complete.html
index
source
Log:
[giow] (2) Make <meta charset=x-user-defined> turn into windows-1252 for legacy reasons
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=23940
Affected topics: HTML Syntax and Parsing
Modified: complete.html
===================================================================
--- complete.html 2014-05-07 22:52:21 UTC (rev 8617)
+++ complete.html 2014-05-07 23:32:17 UTC (rev 8618)
@@ -87923,9 +87923,14 @@
<li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is
false, then jump to the step below labeled <i>next byte</i>.</li>
+ <!-- the next two steps are redundant with steps in the 'change the encoding' algorithm -->
+
<li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change the value of
<var title="">charset</var> to UTF-8.</li>
+ <li><p>If <var title="">charset</var> is the x-user-defined encoding, change the value of
+ <var title="">charset</var> to Windows-1252. <a href=#refsENCODING>[ENCODING]</a></li>
+
<li><p>If <var title="">charset</var> is not a supported character encoding, then jump to the
step below labeled <i>next byte</i>.</li>
@@ -88133,13 +88138,20 @@
failed to find a character encoding, or if it found a character encoding that was not the actual
encoding of the file.</p>
+ <!--CLEANUP--><!-- use <p>s -->
<ol><li>If the encoding that is already being used to interpret the input stream is <a href=#a-utf-16-encoding>a UTF-16
encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored; if it was anything but the
same encoding, then it would be clearly incorrect.</li>
+ <!-- the next two steps are redundant with similar logic in the sniffer -->
+ <!-- if you add anything else here, then factor it out into a common algorithm -->
+
<li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change it to UTF-8.</li>
+ <li>If the new encoding is the x-user-defined encoding, change it to Windows-1252. <a href=#refsENCODING>[ENCODING]</a></li> <!-- apparently this was a Chrome invention, later
+ picked up by Mozilla -->
+
<li>If the new encoding is identical or equivalent to the encoding that is already being used to
interpret the input stream, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to <i>certain</i> and abort these steps.
This happens when the encoding information found in the file matches what the <a href=#encoding-sniffing-algorithm>encoding
@@ -88166,8 +88178,13 @@
encoding. The resource will be misinterpreted. User agents may notify the user of the situation,
to aid in application development.</li>
- </ol><h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.5 </span>Preprocessing the input stream</h5>
+ </ol><p class=note>This algorithm is only invoked when a new encoding is found declared on a
+ <code><a href=#the-meta-element>meta</a></code> element.</p> <!-- this is important for the x-user-defined stuff in particular
+ -->
+
+ <h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.5 </span>Preprocessing the input stream</h5>
+
<p>The <dfn id=input-stream>input stream</dfn> consists of the characters pushed into it as the <a href=#the-input-byte-stream>input byte
stream</a> is decoded or from the various APIs that directly manipulate the input stream.</p>
Modified: index
===================================================================
--- index 2014-05-07 22:52:21 UTC (rev 8617)
+++ index 2014-05-07 23:32:17 UTC (rev 8618)
@@ -87923,9 +87923,14 @@
<li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is
false, then jump to the step below labeled <i>next byte</i>.</li>
+ <!-- the next two steps are redundant with steps in the 'change the encoding' algorithm -->
+
<li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change the value of
<var title="">charset</var> to UTF-8.</li>
+ <li><p>If <var title="">charset</var> is the x-user-defined encoding, change the value of
+ <var title="">charset</var> to Windows-1252. <a href=#refsENCODING>[ENCODING]</a></li>
+
<li><p>If <var title="">charset</var> is not a supported character encoding, then jump to the
step below labeled <i>next byte</i>.</li>
@@ -88133,13 +88138,20 @@
failed to find a character encoding, or if it found a character encoding that was not the actual
encoding of the file.</p>
+ <!--CLEANUP--><!-- use <p>s -->
<ol><li>If the encoding that is already being used to interpret the input stream is <a href=#a-utf-16-encoding>a UTF-16
encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored; if it was anything but the
same encoding, then it would be clearly incorrect.</li>
+ <!-- the next two steps are redundant with similar logic in the sniffer -->
+ <!-- if you add anything else here, then factor it out into a common algorithm -->
+
<li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change it to UTF-8.</li>
+ <li>If the new encoding is the x-user-defined encoding, change it to Windows-1252. <a href=#refsENCODING>[ENCODING]</a></li> <!-- apparently this was a Chrome invention, later
+ picked up by Mozilla -->
+
<li>If the new encoding is identical or equivalent to the encoding that is already being used to
interpret the input stream, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to <i>certain</i> and abort these steps.
This happens when the encoding information found in the file matches what the <a href=#encoding-sniffing-algorithm>encoding
@@ -88166,8 +88178,13 @@
encoding. The resource will be misinterpreted. User agents may notify the user of the situation,
to aid in application development.</li>
- </ol><h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.5 </span>Preprocessing the input stream</h5>
+ </ol><p class=note>This algorithm is only invoked when a new encoding is found declared on a
+ <code><a href=#the-meta-element>meta</a></code> element.</p> <!-- this is important for the x-user-defined stuff in particular
+ -->
+
+ <h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.5 </span>Preprocessing the input stream</h5>
+
<p>The <dfn id=input-stream>input stream</dfn> consists of the characters pushed into it as the <a href=#the-input-byte-stream>input byte
stream</a> is decoded or from the various APIs that directly manipulate the input stream.</p>
Modified: source
===================================================================
--- source 2014-05-07 22:52:21 UTC (rev 8617)
+++ source 2014-05-07 23:32:17 UTC (rev 8618)
@@ -96732,9 +96732,14 @@
<li><p>If <var data-x="">need pragma</var> is true but <var data-x="">got pragma</var> is
false, then jump to the step below labeled <i>next byte</i>.</p></li>
+ <!-- the next two steps are redundant with steps in the 'change the encoding' algorithm -->
+
<li><p>If <var data-x="">charset</var> is <span>a UTF-16 encoding</span>, change the value of
<var data-x="">charset</var> to UTF-8.</p></li>
+ <li><p>If <var data-x="">charset</var> is the x-user-defined encoding, change the value of
+ <var data-x="">charset</var> to Windows-1252. <a href="#refsENCODING">[ENCODING]</a></p></li>
+
<li><p>If <var data-x="">charset</var> is not a supported character encoding, then jump to the
step below labeled <i>next byte</i>.</p></li>
@@ -96991,6 +96996,7 @@
failed to find a character encoding, or if it found a character encoding that was not the actual
encoding of the file.</p>
+ <!--CLEANUP--><!-- use <p>s -->
<ol>
<li>If the encoding that is already being used to interpret the input stream is <span>a UTF-16
@@ -96998,8 +97004,15 @@
<i>certain</i> and abort these steps. The new encoding is ignored; if it was anything but the
same encoding, then it would be clearly incorrect.</li>
+ <!-- the next two steps are redundant with similar logic in the sniffer -->
+ <!-- if you add anything else here, then factor it out into a common algorithm -->
+
<li>If the new encoding is <span>a UTF-16 encoding</span>, change it to UTF-8.</li>
+ <li>If the new encoding is the x-user-defined encoding, change it to Windows-1252. <a
+ href="#refsENCODING">[ENCODING]</a></p></li> <!-- apparently this was a Chrome invention, later
+ picked up by Mozilla -->
+
<li>If the new encoding is identical or equivalent to the encoding that is already being used to
interpret the input stream, then set the <span
data-x="concept-encoding-confidence">confidence</span> to <i>certain</i> and abort these steps.
@@ -97031,7 +97044,11 @@
</ol>
+ <p class="note">This algorithm is only invoked when a new encoding is found declared on a
+ <code>meta</code> element.</p> <!-- this is important for the x-user-defined stuff in particular
+ -->
+
<h5>Preprocessing the input stream</h5>
<p>The <dfn>input stream</dfn> consists of the characters pushed into it as the <span>input byte
More information about the Commit-Watchers
mailing list