[html5] r4993 - [giow] (2) Change how character encodings are sniffed to require an http-equiv a [...]

whatwg at whatwg.org whatwg at whatwg.org
Sun Apr 11 23:43:46 PDT 2010


Author: ianh
Date: 2010-04-11 23:43:45 -0700 (Sun, 11 Apr 2010)
New Revision: 4993

Modified:
   complete.html
   index
   source
Log:
[giow] (2) Change how character encodings are sniffed to require an http-equiv attribute, and to only process one character encoding per <meta> element, even if attributes are duplicated.
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=9225

Modified: complete.html
===================================================================
--- complete.html	2010-04-12 05:48:33 UTC (rev 4992)
+++ complete.html	2010-04-12 06:43:45 UTC (rev 4993)
@@ -74090,37 +74090,72 @@
          0x2F byte (the one in sequence of characters matched
          above).</li>
 
-         <li><p><a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get
-         an attribute</a> and its value. If no attribute was
-         sniffed, then skip this inner set of steps, and jump to the
-         second step in the overall "two step" algorithm.</li>
+         <li><p>Let <var title="">attribute list</var> be an empty
+         list of strings.</li> <!-- so long as we only care about
+         http-equiv, content, and charset, this can be a 3-bit
+         bitfield -->
 
-         <li><p>If the attribute's name is neither "<code title="">charset</code>" nor "<code title="">content</code>",
-         then return to step 2 in these inner steps.</li>
+         <li><p>Let <var title="">got pragma</var> be false.</li>
 
-         <li><p>If the attribute's name is "<code title="">charset</code>", let <var title="">charset</var> be
-         the attribute's value, interpreted as a character
-         encoding.</li>
+         <li><p>Let <var title="">mode</var> be null.</li>
 
-         <li><p>Otherwise, the attribute's name is "<code title="">content</code>": apply the <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for
-         extracting an encoding from a Content-Type</a>, giving the
-         attribute's value as the string to parse. If an encoding is
-         returned, let <var title="">charset</var> be that
-         encoding. Otherwise, return to step 2 in these inner
-         steps.</li>
+         <li><p>Let <var title="">charset</var> be the null value
+         (which, for the purposes of this algorithm, is distinct from
+         an unrecognised encoding or the empty string).</li>
 
+         <li><p><i>Attributes</i>: <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get an
+         attribute</a> and its value. If no attribute was sniffed,
+         then jump to the <i>processing</i> step below.</li>
+
+         <li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
+         labeled <i>attributes</i>.</p>
+
+         <li>
+
+          <p>Run the appropriate step from the following list, if one
+          applies:</p>
+
+          <dl class=switch><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>
+
+           <dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
+           pragma</var> to true.</dd>
+
+           <dt>If the attribute's name is "<code title="">charset</code>"</dt>
+
+           <dd><p>If <var title="">charset</var> is still set to null,
+           let <var title="">charset</var> be the encoding
+           corresponding to the attribute's value, and set <var title="">mode</var> to "charset".</dd>
+
+           <dt>If the attribute's name is "<code title="">content</code>"</dt>
+
+           <dd><p>Apply the <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding
+           from a Content-Type</a>, giving the attribute's value as
+           the string to parse. If an encoding is returned, and if
+           <var title="">charset</var> is still set to null, let <var title="">charset</var> be the encoding returned, and set
+           <var title="">mode</var> to "pragma".</dd>
+
+          </dl></li>
+
+         <li><p>Return to the step labeled <i>attributes</i>.</li>
+
+         <li><p><i>Processing</i>: If <var title="">mode</var> is
+         null, then jump to the second step of the overall "two step"
+         algorithm.</li>
+
+         <li><p>If <var title="">mode</var> is "pragma" but <var title="">got pragma</var> is false, then jump to the second
+         step of the overall "two step" algorithm.</li>
+
          <li><p>If <var title="">charset</var> is a UTF-16 encoding,
          change the value of <var title="">charset</var> to
          UTF-8.</li>
 
-         <li><p>If <var title="">charset</var> is a supported
-         character encoding, then return the given encoding, with
-         <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+         <li><p>If <var title="">charset</var> is not a supported
+         character encoding, then jump to the second step of the
+         overall "two step" algorithm.</li>
+
+         <li><p>Return the encoding given by <var title="">charset</var>, with <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
          <i>tentative</i>, and abort all these steps.</li>
 
-         <li><p>Otherwise, return to step 2 in these inner
-         steps.</li>
-
         </ol></dd>
 
        <dt>A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>

Modified: index
===================================================================
--- index	2010-04-12 05:48:33 UTC (rev 4992)
+++ index	2010-04-12 06:43:45 UTC (rev 4993)
@@ -67362,37 +67362,72 @@
          0x2F byte (the one in sequence of characters matched
          above).</li>
 
-         <li><p><a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get
-         an attribute</a> and its value. If no attribute was
-         sniffed, then skip this inner set of steps, and jump to the
-         second step in the overall "two step" algorithm.</li>
+         <li><p>Let <var title="">attribute list</var> be an empty
+         list of strings.</li> <!-- so long as we only care about
+         http-equiv, content, and charset, this can be a 3-bit
+         bitfield -->
 
-         <li><p>If the attribute's name is neither "<code title="">charset</code>" nor "<code title="">content</code>",
-         then return to step 2 in these inner steps.</li>
+         <li><p>Let <var title="">got pragma</var> be false.</li>
 
-         <li><p>If the attribute's name is "<code title="">charset</code>", let <var title="">charset</var> be
-         the attribute's value, interpreted as a character
-         encoding.</li>
+         <li><p>Let <var title="">mode</var> be null.</li>
 
-         <li><p>Otherwise, the attribute's name is "<code title="">content</code>": apply the <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for
-         extracting an encoding from a Content-Type</a>, giving the
-         attribute's value as the string to parse. If an encoding is
-         returned, let <var title="">charset</var> be that
-         encoding. Otherwise, return to step 2 in these inner
-         steps.</li>
+         <li><p>Let <var title="">charset</var> be the null value
+         (which, for the purposes of this algorithm, is distinct from
+         an unrecognised encoding or the empty string).</li>
 
+         <li><p><i>Attributes</i>: <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get an
+         attribute</a> and its value. If no attribute was sniffed,
+         then jump to the <i>processing</i> step below.</li>
+
+         <li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
+         labeled <i>attributes</i>.</p>
+
+         <li>
+
+          <p>Run the appropriate step from the following list, if one
+          applies:</p>
+
+          <dl class=switch><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>
+
+           <dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
+           pragma</var> to true.</dd>
+
+           <dt>If the attribute's name is "<code title="">charset</code>"</dt>
+
+           <dd><p>If <var title="">charset</var> is still set to null,
+           let <var title="">charset</var> be the encoding
+           corresponding to the attribute's value, and set <var title="">mode</var> to "charset".</dd>
+
+           <dt>If the attribute's name is "<code title="">content</code>"</dt>
+
+           <dd><p>Apply the <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding
+           from a Content-Type</a>, giving the attribute's value as
+           the string to parse. If an encoding is returned, and if
+           <var title="">charset</var> is still set to null, let <var title="">charset</var> be the encoding returned, and set
+           <var title="">mode</var> to "pragma".</dd>
+
+          </dl></li>
+
+         <li><p>Return to the step labeled <i>attributes</i>.</li>
+
+         <li><p><i>Processing</i>: If <var title="">mode</var> is
+         null, then jump to the second step of the overall "two step"
+         algorithm.</li>
+
+         <li><p>If <var title="">mode</var> is "pragma" but <var title="">got pragma</var> is false, then jump to the second
+         step of the overall "two step" algorithm.</li>
+
          <li><p>If <var title="">charset</var> is a UTF-16 encoding,
          change the value of <var title="">charset</var> to
          UTF-8.</li>
 
-         <li><p>If <var title="">charset</var> is a supported
-         character encoding, then return the given encoding, with
-         <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+         <li><p>If <var title="">charset</var> is not a supported
+         character encoding, then jump to the second step of the
+         overall "two step" algorithm.</li>
+
+         <li><p>Return the encoding given by <var title="">charset</var>, with <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
          <i>tentative</i>, and abort all these steps.</li>
 
-         <li><p>Otherwise, return to step 2 in these inner
-         steps.</li>
-
         </ol></dd>
 
        <dt>A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>

Modified: source
===================================================================
--- source	2010-04-12 05:48:33 UTC (rev 4992)
+++ source	2010-04-12 06:43:45 UTC (rev 4993)
@@ -84379,40 +84379,87 @@
          0x2F byte (the one in sequence of characters matched
          above).</p></li>
 
-         <li><p><span title="concept-get-attributes-when-sniffing">Get
-         an attribute</span> and its value. If no attribute was
-         sniffed, then skip this inner set of steps, and jump to the
-         second step in the overall "two step" algorithm.</p></li>
+         <li><p>Let <var title="">attribute list</var> be an empty
+         list of strings.</p></li> <!-- so long as we only care about
+         http-equiv, content, and charset, this can be a 3-bit
+         bitfield -->
 
-         <li><p>If the attribute's name is neither "<code
-         title="">charset</code>" nor "<code title="">content</code>",
-         then return to step 2 in these inner steps.</p></li>
+         <li><p>Let <var title="">got pragma</var> be false.</p></li>
 
-         <li><p>If the attribute's name is "<code
-         title="">charset</code>", let <var title="">charset</var> be
-         the attribute's value, interpreted as a character
-         encoding.</p></li>
+         <li><p>Let <var title="">mode</var> be null.</p></li>
 
-         <li><p>Otherwise, the attribute's name is "<code
-         title="">content</code>": apply the <span>algorithm for
-         extracting an encoding from a Content-Type</span>, giving the
-         attribute's value as the string to parse. If an encoding is
-         returned, let <var title="">charset</var> be that
-         encoding. Otherwise, return to step 2 in these inner
-         steps.</p></li>
+         <li><p>Let <var title="">charset</var> be the null value
+         (which, for the purposes of this algorithm, is distinct from
+         an unrecognised encoding or the empty string).</p></li>
 
+         <li><p><i>Attributes</i>: <span
+         title="concept-get-attributes-when-sniffing">Get an
+         attribute</span> and its value. If no attribute was sniffed,
+         then jump to the <i>processing</i> step below.</p></li>
+
+         <li><p>If the attribute's name is already in <var
+         title="">attribute list</var>, then return to the step
+         labeled <i>attributes</i>.</p>
+
+         <li>
+
+          <p>Run the appropriate step from the following list, if one
+          applies:</p>
+
+          <dl class="switch">
+
+           <dt>If the attribute's name is "<code
+           title="">http-equiv</code>"</dt>
+
+           <dd><p>If the attribute's value is "<code
+           title="">content-type</code>", then set <var title="">got
+           pragma</var> to true.</p></dd>
+
+           <dt>If the attribute's name is "<code
+           title="">charset</code>"</dt>
+
+           <dd><p>If <var title="">charset</var> is still set to null,
+           let <var title="">charset</var> be the encoding
+           corresponding to the attribute's value, and set <var
+           title="">mode</var> to "charset".</p></dd>
+
+           <dt>If the attribute's name is "<code
+           title="">content</code>"</dt>
+
+           <dd><p>Apply the <span>algorithm for extracting an encoding
+           from a Content-Type</span>, giving the attribute's value as
+           the string to parse. If an encoding is returned, and if
+           <var title="">charset</var> is still set to null, let <var
+           title="">charset</var> be the encoding returned, and set
+           <var title="">mode</var> to "pragma".</p></dd>
+
+          </dl>
+
+         </li>
+
+         <li><p>Return to the step labeled <i>attributes</i>.</p></li>
+
+         <li><p><i>Processing</i>: If <var title="">mode</var> is
+         null, then jump to the second step of the overall "two step"
+         algorithm.</p></li>
+
+         <li><p>If <var title="">mode</var> is "pragma" but <var
+         title="">got pragma</var> is false, then jump to the second
+         step of the overall "two step" algorithm.</p></li>
+
          <li><p>If <var title="">charset</var> is a UTF-16 encoding,
          change the value of <var title="">charset</var> to
          UTF-8.</p></li>
 
-         <li><p>If <var title="">charset</var> is a supported
-         character encoding, then return the given encoding, with
-         <span title="concept-encoding-confidence">confidence</span>
+         <li><p>If <var title="">charset</var> is not a supported
+         character encoding, then jump to the second step of the
+         overall "two step" algorithm.</p></li>
+
+         <li><p>Return the encoding given by <var
+         title="">charset</var>, with <span
+         title="concept-encoding-confidence">confidence</span>
          <i>tentative</i>, and abort all these steps.</p></li>
 
-         <li><p>Otherwise, return to step 2 in these inner
-         steps.</p></li>
-
         </ol>
 
        </dd>




More information about the Commit-Watchers mailing list