[html5] r8618 - [giow] (2) Make <meta charset=x-user-defined> turn into windows-1252 for legacy [...]

Wed May 7 16:32:27 PDT 2014

Author: ianh
Date: 2014-05-07 16:32:17 -0700 (Wed, 07 May 2014)
New Revision: 8618

Modified:
   complete.html
   index
   source
Log:
[giow] (2) Make <meta charset=x-user-defined> turn into windows-1252 for legacy reasons
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=23940
Affected topics: HTML Syntax and Parsing

Modified: complete.html
===================================================================

--- complete.html	2014-05-07 22:52:21 UTC (rev 8617)
+++ complete.html	2014-05-07 23:32:17 UTC (rev 8618)
@@ -87923,9 +87923,14 @@
        <li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is
        false, then jump to the step below labeled <i>next byte</i>.</li>
 
+       <!-- the next two steps are redundant with steps in the 'change the encoding' algorithm -->
+
        <li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change the value of
        <var title="">charset</var> to UTF-8.</li>
 
+       <li><p>If <var title="">charset</var> is the x-user-defined encoding, change the value of
+       <var title="">charset</var> to Windows-1252. <a href=#refsENCODING>[ENCODING]</a></li>
+
        <li><p>If <var title="">charset</var> is not a supported character encoding, then jump to the
        step below labeled <i>next byte</i>.</li>
 
@@ -88133,13 +88138,20 @@
   failed to find a character encoding, or if it found a character encoding that was not the actual
   encoding of the file.</p>
 
+  <!--CLEANUP--><!-- use <p>s -->
   <ol><li>If the encoding that is already being used to interpret the input stream is <a href=#a-utf-16-encoding>a UTF-16
    encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
    <i>certain</i> and abort these steps. The new encoding is ignored; if it was anything but the
    same encoding, then it would be clearly incorrect.</li>
 
+   <!-- the next two steps are redundant with similar logic in the sniffer -->
+   <!-- if you add anything else here, then factor it out into a common algorithm -->
+
    <li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change it to UTF-8.</li>
 
+   <li>If the new encoding is the x-user-defined encoding, change it to Windows-1252. <a href=#refsENCODING>[ENCODING]</a></li> <!-- apparently this was a Chrome invention, later
+   picked up by Mozilla -->
+
    <li>If the new encoding is identical or equivalent to the encoding that is already being used to
    interpret the input stream, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to <i>certain</i> and abort these steps.
    This happens when the encoding information found in the file matches what the <a href=#encoding-sniffing-algorithm>encoding
@@ -88166,8 +88178,13 @@
    encoding. The resource will be misinterpreted. User agents may notify the user of the situation,
    to aid in application development.</li>
 
-  </ol><h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.5 </span>Preprocessing the input stream</h5>
+  </ol><p class=note>This algorithm is only invoked when a new encoding is found declared on a
+  <code><a href=#the-meta-element>meta</a></code> element.</p> <!-- this is important for the x-user-defined stuff in particular
+  -->
 
+
+  <h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.5 </span>Preprocessing the input stream</h5>
+
   <p>The <dfn id=input-stream>input stream</dfn> consists of the characters pushed into it as the <a href=#the-input-byte-stream>input byte
   stream</a> is decoded or from the various APIs that directly manipulate the input stream.</p>
 

Modified: index
===================================================================
--- index	2014-05-07 22:52:21 UTC (rev 8617)
+++ index	2014-05-07 23:32:17 UTC (rev 8618)
@@ -87923,9 +87923,14 @@
        <li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is
        false, then jump to the step below labeled <i>next byte</i>.</li>
 
+       <!-- the next two steps are redundant with steps in the 'change the encoding' algorithm -->
+
        <li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change the value of
        <var title="">charset</var> to UTF-8.</li>
 
+       <li><p>If <var title="">charset</var> is the x-user-defined encoding, change the value of
+       <var title="">charset</var> to Windows-1252. <a href=#refsENCODING>[ENCODING]</a></li>
+
        <li><p>If <var title="">charset</var> is not a supported character encoding, then jump to the
        step below labeled <i>next byte</i>.</li>
 
@@ -88133,13 +88138,20 @@
   failed to find a character encoding, or if it found a character encoding that was not the actual
   encoding of the file.</p>
 
+  <!--CLEANUP--><!-- use <p>s -->
   <ol><li>If the encoding that is already being used to interpret the input stream is <a href=#a-utf-16-encoding>a UTF-16
    encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
    <i>certain</i> and abort these steps. The new encoding is ignored; if it was anything but the
    same encoding, then it would be clearly incorrect.</li>
 
+   <!-- the next two steps are redundant with similar logic in the sniffer -->
+   <!-- if you add anything else here, then factor it out into a common algorithm -->
+
    <li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change it to UTF-8.</li>
 
+   <li>If the new encoding is the x-user-defined encoding, change it to Windows-1252. <a href=#refsENCODING>[ENCODING]</a></li> <!-- apparently this was a Chrome invention, later
+   picked up by Mozilla -->
+
    <li>If the new encoding is identical or equivalent to the encoding that is already being used to
    interpret the input stream, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to <i>certain</i> and abort these steps.
    This happens when the encoding information found in the file matches what the <a href=#encoding-sniffing-algorithm>encoding
@@ -88166,8 +88178,13 @@
    encoding. The resource will be misinterpreted. User agents may notify the user of the situation,
    to aid in application development.</li>
 
-  </ol><h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.5 </span>Preprocessing the input stream</h5>
+  </ol><p class=note>This algorithm is only invoked when a new encoding is found declared on a
+  <code><a href=#the-meta-element>meta</a></code> element.</p> <!-- this is important for the x-user-defined stuff in particular
+  -->
 
+
+  <h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.5 </span>Preprocessing the input stream</h5>
+
   <p>The <dfn id=input-stream>input stream</dfn> consists of the characters pushed into it as the <a href=#the-input-byte-stream>input byte
   stream</a> is decoded or from the various APIs that directly manipulate the input stream.</p>
 

Modified: source
===================================================================
--- source	2014-05-07 22:52:21 UTC (rev 8617)
+++ source	2014-05-07 23:32:17 UTC (rev 8618)
@@ -96732,9 +96732,14 @@
        <li><p>If <var data-x="">need pragma</var> is true but <var data-x="">got pragma</var> is
        false, then jump to the step below labeled <i>next byte</i>.</p></li>
 
+       <!-- the next two steps are redundant with steps in the 'change the encoding' algorithm -->
+
        <li><p>If <var data-x="">charset</var> is <span>a UTF-16 encoding</span>, change the value of
        <var data-x="">charset</var> to UTF-8.</p></li>
 
+       <li><p>If <var data-x="">charset</var> is the x-user-defined encoding, change the value of
+       <var data-x="">charset</var> to Windows-1252. <a href="#refsENCODING">[ENCODING]</a></p></li>
+
        <li><p>If <var data-x="">charset</var> is not a supported character encoding, then jump to the
        step below labeled <i>next byte</i>.</p></li>
 
@@ -96991,6 +96996,7 @@
   failed to find a character encoding, or if it found a character encoding that was not the actual
   encoding of the file.</p>
 
+  <!--CLEANUP--><!-- use <p>s -->
   <ol>
 
    <li>If the encoding that is already being used to interpret the input stream is <span>a UTF-16
@@ -96998,8 +97004,15 @@
    <i>certain</i> and abort these steps. The new encoding is ignored; if it was anything but the
    same encoding, then it would be clearly incorrect.</li>
 
+   <!-- the next two steps are redundant with similar logic in the sniffer -->
+   <!-- if you add anything else here, then factor it out into a common algorithm -->
+
    <li>If the new encoding is <span>a UTF-16 encoding</span>, change it to UTF-8.</li>
 
+   <li>If the new encoding is the x-user-defined encoding, change it to Windows-1252. <a
+   href="#refsENCODING">[ENCODING]</a></p></li> <!-- apparently this was a Chrome invention, later
+   picked up by Mozilla -->
+
    <li>If the new encoding is identical or equivalent to the encoding that is already being used to
    interpret the input stream, then set the <span
    data-x="concept-encoding-confidence">confidence</span> to <i>certain</i> and abort these steps.
@@ -97031,7 +97044,11 @@
 
   </ol>
 
+  <p class="note">This algorithm is only invoked when a new encoding is found declared on a
+  <code>meta</code> element.</p> <!-- this is important for the x-user-defined stuff in particular
+  -->
 
+
   <h5>Preprocessing the input stream</h5>
 
   <p>The <dfn>input stream</dfn> consists of the characters pushed into it as the <span>input byte