[html5] r6990 - [e] (0) Factor out the prescan algorithm for reuse in other specs. Fixing https: [...]

Mon Feb 13 13:07:00 PST 2012

Author: ianh
Date: 2012-02-13 13:06:58 -0800 (Mon, 13 Feb 2012)
New Revision: 6990

Modified:
   complete.html
   index
   source
Log:
[e] (0) Factor out the prescan algorithm for reuse in other specs.
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=14284
Affected topics: HTML Syntax and Parsing

Modified: complete.html
===================================================================

--- complete.html	2012-02-11 18:45:11 UTC (rev 6989)
+++ complete.html	2012-02-13 21:06:58 UTC (rev 6990)
@@ -240,7 +240,7 @@
 
   <header class=head id=head><p><a class=logo href=http://www.whatwg.org/><img alt=WHATWG height=101 src=/images/logo width=101></a></p>
    <hgroup><h1 class=allcaps>HTML</h1>
-    <h2 class="no-num no-toc">Living Standard — Last Updated 11 February 2012</h2>
+    <h2 class="no-num no-toc">Living Standard — Last Updated 13 February 2012</h2>
    </hgroup><dl><dt><strong>Web developer edition:</strong></dt>
     <dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
     <dt>Multiple-page version:</dt>
@@ -81188,10 +81188,10 @@
   parse of the document with the real encoding.</p>
 
   <p id=documentEncoding>User agents must use the following
-  algorithm (the <dfn id=encoding-sniffing-algorithm>encoding sniffing algorithm</dfn>) to determine
-  the character encoding to use when decoding a document in the first
-  pass. This algorithm takes as input any out-of-band metadata
-  available to the user agent (e.g. the <a href=#content-type title=Content-Type>Content-Type metadata</a> of the document)
+  algorithm, called the <dfn id=encoding-sniffing-algorithm>encoding sniffing algorithm</dfn>, to
+  determine the character encoding to use when decoding a document in
+  the first pass. This algorithm takes as input any out-of-band
+  metadata available to the user agent (e.g. the <a href=#content-type title=Content-Type>Content-Type metadata</a> of the document)
   and all the bytes available so far, and returns an encoding and a
   <dfn id=concept-encoding-confidence title=concept-encoding-confidence>confidence</dfn>. The
   confidence is either <i>tentative</i>, <i>certain</i>, or
@@ -81227,9 +81227,9 @@
 
     <p class=note>The authoring conformance requirements for
     character encoding declarations limit them to only appearing <a href=#charset1024>in the first 1024 bytes</a>. User agents are
-    therefore encouraged to use the preparse algorithm below (part of
-    these steps) on the first 1024 bytes, but not to stall beyond
-    that.</p>
+    therefore encouraged to use the prescan algorithm below (as
+    invoked by these steps) on the first 1024 bytes, but not to stall
+    beyond that.</p>
 
    </li>
 
@@ -81265,317 +81265,28 @@
     </table><p class=note>This step looks for Unicode Byte Order Marks
    (BOMs).</li>
 
-   <li><p>Otherwise, the user agent will have to search for explicit
-   character encoding information in the file itself. This should
-   proceed as follows:
+   <li>
 
-    <p>Let <var title="">position</var> be a pointer to a byte in the
-    input stream, initially pointing at the first byte. If at any
-    point during these substeps the user agent either runs out of
-    bytes or decides that scanning further bytes would not be
-    efficient, then skip to the next step of the overall character
-    encoding detection algorithm. User agents may decide that scanning
-    <em>any</em> bytes is not efficient, in which case these substeps
-    are entirely skipped.</p>
+    <p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
+    determine its encoding">prescan the byte stream to determine its
+    encoding</a>. The <var title="">end condition</var> is that the
+    user agent decides that scanning further bytes would not be
+    efficient. User agents are encouraged to only prescan the first
+    1024 bytes. User agents may decide that scanning <em>any</em>
+    bytes is not efficient, in which case these substeps are entirely
+    skipped.</p>
 
-    <p>Now, repeat the following "two" steps until the algorithm
-    aborts (either because user agent aborts, as described above, or
-    because a character encoding is found):</p>
+    <p>The aforementioned algorithm either aborts unsuccessfully or
+    returns a character encoding. If it returns a character encoding,
+    then this algorithm must be aborted, returning the same encoding,
+    with <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+    <i>tentative</i>.</p>
 
-    <ol><li><p>If <var title="">position</var> points to:</p>
-
-      <dl class=switch><dt>A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--')</dt>
-       <dd>
-
-        <p>Advance the <var title="">position</var> pointer so that it
-        points at the first 0x3E byte which is preceded by two 0x2D
-        bytes (i.e. at the end of an ASCII '-->' sequence) and comes
-        after the 0x3C byte that was found. (The two 0x2D bytes can be
-        the same as the those in the '<!--' sequence.)</p>
-
-       </dd>
-
-       <dt>A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '<meta' followed by a space or slash)</dt>
-       <dd>
-
-        <ol><li><p>Advance the <var title="">position</var> pointer so
-         that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or
-         0x2F byte (the one in sequence of characters matched
-         above).</li>
-
-         <li><p>Let <var title="">attribute list</var> be an empty
-         list of strings.</li> <!-- so long as we only care about
-         http-equiv, content, and charset, this can be a 3-bit
-         bitfield -->
-
-         <li><p>Let <var title="">got pragma</var> be false.</li>
-
-         <li><p>Let <var title="">need pragma</var> be null.</li>
-
-         <li><p>Let <var title="">charset</var> be the null value
-         (which, for the purposes of this algorithm, is distinct from
-         an unrecognised encoding or the empty string).</li>
-
-         <li><p><i>Attributes</i>: <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get an
-         attribute</a> and its value. If no attribute was sniffed,
-         then jump to the <i>processing</i> step below.</li>
-
-         <li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
-         labeled <i>attributes</i>.</p>
-
-         <li><p>Add the attribute's name to <var title="">attribute
-         list</var>.</p>
-
-         <li>
-
-          <p>Run the appropriate step from the following list, if one
-          applies:</p>
-
-          <dl class=switch><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>
-
-           <dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
-           pragma</var> to true.</dd>
-
-           <dt>If the attribute's name is "<code title="">content</code>"</dt>
-
-           <dd><p>Apply the <a href=#algorithm-for-extracting-an-encoding-from-a-meta-element>algorithm for extracting an encoding
-           from a <code>meta</code> element</a>, giving the
-           attribute's value as the string to parse. If an encoding is
-           returned, and if <var title="">charset</var> is still set
-           to null, let <var title="">charset</var> be the encoding
-           returned, and set <var title="">need pragma</var> to
-           true.</dd>
-
-           <dt>If the attribute's name is "<code title="">charset</code>"</dt>
-
-           <dd><p>Let <var title="">charset</var> be the encoding
-           corresponding to the attribute's value, and set <var title="">need pragma</var> to false.</dd>
-
-          </dl></li>
-
-         <li><p>Return to the step labeled <i>attributes</i>.</li>
-
-         <li><p><i>Processing</i>: If <var title="">need pragma</var>
-         is null, then jump to the second step of the overall "two
-         step" algorithm.</li>
-
-         <li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is false, then jump to the second
-         step of the overall "two step" algorithm.</li>
-
-         <li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16
-         encoding</a>, change the value of <var title="">charset</var> to UTF-8.</li>
-
-         <li><p>If <var title="">charset</var> is not a supported
-         character encoding, then jump to the second step of the
-         overall "two step" algorithm.</li>
-
-         <li><p>Return the encoding given by <var title="">charset</var>, with <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
-         <i>tentative</i>, and abort all these steps.</li>
-
-        </ol></dd>
-
-       <dt>A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>
-       <dd>
-
-        <ol><li><p>Advance the <var title="">position</var> pointer so
-         that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF),
-         0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
-         (ASCII >) byte.</li>
-
-         <li><p>Repeatedly <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
-         attribute</a> until no further attributes can be found,
-         then jump to the second step in the overall "two step"
-         algorithm.</li>
-
-        </ol></dd>
-
-       <dt>A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!')</dt>
-       <dt>A sequence of bytes starting with: 0x3C 0x2F (ASCII '</')</dt>
-       <dt>A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')</dt>
-       <dd>
-
-        <p>Advance the <var title="">position</var> pointer so that it
-        points at the first 0x3E byte (ASCII >) that comes after the
-        0x3C byte that was found.</p>
-
-       </dd>
-
-       <dt>Any other byte</dt>
-       <dd>
-
-        <p>Do nothing with that byte.</p>
-
-       </dd>
-
-      </dl></li>
-
-     <li>Move <var title="">position</var> so it points at the next
-     byte in the input stream, and return to the first step of this
-     "two step" algorithm.</li>
-
-    </ol><p>When the above "two step" algorithm says to <dfn id=concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
-    attribute</dfn>, it means doing this:</p>
-
-    <ol><li><p>If the byte at <var title="">position</var> is one of 0x09
-     (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR),
-     0x20 (ASCII space), or 0x2F (ASCII /) then advance <var title="">position</var> to the next byte and redo this
-     substep.</li>
-
-     <li><p>If the byte at <var title="">position</var> is 0x3E (ASCII
-     >), then abort the "get an attribute" algorithm. There isn't
-     one.</li>
-
-     <li><p>Otherwise, the byte at <var title="">position</var> is the
-     start of the attribute name. Let <var title="">attribute
-     name</var> and <var title="">attribute value</var> be the empty
-     string.</li>
-
-     <li><p><i>Attribute name</i>: Process the byte at <var title="">position</var> as follows:</p>
-
-      <dl class=switch><dt>If it is 0x3D (ASCII =), and the <var title="">attribute
-       name</var> is longer than the empty string</dt>
-
-       <dd>Advance <var title="">position</var> to the next byte and
-       jump to the step below labeled <i>value</i>.</dd>
-
-       <dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
-       FF), 0x0D (ASCII CR), or 0x20 (ASCII space)</dt>
-
-       <dd>Jump to the step below labeled <i>spaces</i>.</dd>
-
-       <dt>If it is 0x2F (ASCII /) or 0x3E (ASCII >)</dt>
-
-       <dd>Abort the "get an attribute" algorithm. The attribute's
-       name is the value of <var title="">attribute name</var>, its
-       value is the empty string.</dd>
-
-       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
-       Z)</dt>
-
-       <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute name</var> (where <var title="">b</var> is
-       the value of the byte at <var title="">position</var>). (This
-       converts the input to lowercase.)</dd>
-
-       <dt>Anything else</dt>
-
-       <dd>Append the Unicode character with the same code point as the
-       value of the byte at <var title="">position</var>) to <var title="">attribute name</var>. (It doesn't actually matter how
-       bytes outside the ASCII range are handled here, since only
-       ASCII characters can contribute to the detection of a character
-       encoding.)</dd>
-
-      </dl></li>
-
-     <li><p>Advance <var title="">position</var> to the next byte and
-     return to the previous step.</li>
-
-     <li><p><i>Spaces</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
-     LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
-     advance <var title="">position</var> to the next byte, then,
-     repeat this step.</li>
-
-     <li><p>If the byte at <var title="">position</var> is
-     <em>not</em> 0x3D (ASCII =), abort the "get an attribute"
-     algorithm. The attribute's name is the value of <var title="">attribute name</var>, its value is the empty
-     string.</li>
-
-     <li><p>Advance <var title="">position</var> past the 0x3D (ASCII
-     =) byte.</li>
-
-     <li><p><i>Value</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
-     LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
-     advance <var title="">position</var> to the next byte, then,
-     repeat this step.</li>
-
-     <li><p>Process the byte at <var title="">position</var> as
-     follows:</p>
-
-      <dl class=switch><dt>If it is 0x22 (ASCII ") or 0x27 (ASCII ')</dt>
-
-       <dd>
-
-        <ol><li>Let <var title="">b</var> be the value of the byte at
-         <var title="">position</var>.</li>
-
-         <li>Advance <var title="">position</var> to the next
-         byte.</li>
-
-         <li>If the value of the byte at <var title="">position</var>
-         is the value of <var title="">b</var>, then advance <var title="">position</var> to the next byte and abort the "get
-         an attribute" algorithm. The attribute's name is the value of
-         <var title="">attribute name</var>, and its value is the
-         value of <var title="">attribute value</var>.</li>
-
-         <li>Otherwise, if the value of the byte at <var title="">position</var> is in the range 0x41 (ASCII A) to
-         0x5A (ASCII Z), then append a Unicode character to <var title="">attribute value</var> whose code point is 0x20 more
-         than the value of the byte at <var title="">position</var>.</li>
-
-         <li>Otherwise, append a Unicode character to <var title="">attribute value</var> whose code point is the same as
-         the value of the byte at <var title="">position</var>.</li>
-
-         <li>Return to the second step in these substeps.</li>
-
-        </ol></dd>
-
-       <dt>If it is 0x3E (ASCII >)</dt>
-
-       <dd>Abort the "get an attribute" algorithm. The attribute's
-       name is the value of <var title="">attribute name</var>, its
-       value is the empty string.</dd>
-
-
-       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
-       Z)</dt>
-
-       <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
-       value</var> (where <var title="">b</var> is the value of the
-       byte at <var title="">position</var>). Advance <var title="">position</var> to the next byte.</dd>
-
-       <dt>Anything else</dt>
-
-       <dd>Append the Unicode character with the same code point as the
-       value of the byte at <var title="">position</var>) to <var title="">attribute value</var>. Advance <var title="">position</var> to the next byte.</dd>
-
-      </dl></li>
-
-     <li><p>Process the byte at <var title="">position</var> as
-     follows:</p>
-
-      <dl class=switch><dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
-       FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII
-       >)</dt>
-
-       <dd>Abort the "get an attribute" algorithm. The attribute's
-       name is the value of <var title="">attribute name</var> and its
-       value is the value of <var title="">attribute value</var>.</dd>
-
-       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
-       Z)</dt>
-
-       <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
-       value</var> (where <var title="">b</var> is the value of the
-       byte at <var title="">position</var>).</dd>
-
-       <dt>Anything else</dt>
-
-       <dd>Append the Unicode character with the same code point as the
-       value of the byte at <var title="">position</var>) to <var title="">attribute value</var>.</dd>
-
-      </dl></li>
-
-     <li><p>Advance <var title="">position</var> to the next byte and
-     return to the previous step.</li>
-
-    </ol><p>For the sake of interoperability, user agents should not use a
-    pre-scan algorithm that returns different results than the one
-    described above. (But, if you do, please at least let us know, so
-    that we can improve this algorithm and benefit everyone...)</p>
-
    </li>
 
-   <li><p>If the user agent has information on the likely encoding for
-   this page, e.g. based on the encoding of the page when it was last
-   visited, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+   <li><p>Otherwise, if the user agent has information on the likely
+   encoding for this page, e.g. based on the encoding of the page when
+   it was last visited, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
    <i>tentative</i>, and abort these steps.</li>
 
    <li>
@@ -81719,18 +81430,328 @@
   as the user agent uses the returned value to select the decoder to
   use for the input stream.</p>
 
+  <hr><p>When an algorithm requires a user agent to <dfn id=prescan-a-byte-stream-to-determine-its-encoding>prescan a byte
+  stream to determine its encoding</dfn>, given some defined <var title="">end condition</var>, then it must run the following steps.
+  These steps either abort unsuccessfully or return a character
+  encoding.</p>
+
+  <ol><li>
+
+    <p>Let <var title="">position</var> be a pointer to a byte in the
+    input stream, initially pointing at the first byte. If at any
+    point during these steps the user agent either runs out of bytes
+    or reaches its <var title="">end condition</var>, then abort the
+    <a href=#prescan-a-byte-stream-to-determine-its-encoding>prescan a byte stream to determine its encoding</a>
+    algorithm unsuccessfully.</p>
+
+   </li>
+
+   <li>
+
+    <p><i>Loop</i>: If <var title="">position</var> points to:</p>
+
+    <dl class=switch><dt>A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--')</dt>
+     <dd>
+
+      <p>Advance the <var title="">position</var> pointer so that it
+      points at the first 0x3E byte which is preceded by two 0x2D
+      bytes (i.e. at the end of an ASCII '-->' sequence) and comes
+      after the 0x3C byte that was found. (The two 0x2D bytes can be
+      the same as the those in the '<!--' sequence.)</p>
+
+     </dd>
+
+     <dt>A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '<meta' followed by a space or slash)</dt>
+     <dd>
+
+      <ol><li><p>Advance the <var title="">position</var> pointer so
+       that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or
+       0x2F byte (the one in sequence of characters matched
+       above).</li>
+
+       <li><p>Let <var title="">attribute list</var> be an empty
+       list of strings.</li> <!-- so long as we only care about
+       http-equiv, content, and charset, this can be a 3-bit
+       bitfield -->
+
+       <li><p>Let <var title="">got pragma</var> be false.</li>
+
+       <li><p>Let <var title="">need pragma</var> be null.</li>
+
+       <li><p>Let <var title="">charset</var> be the null value
+       (which, for the purposes of this algorithm, is distinct from
+       an unrecognised encoding or the empty string).</li>
+
+       <li><p><i>Attributes</i>: <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get an
+       attribute</a> and its value. If no attribute was sniffed,
+       then jump to the <i>processing</i> step below.</li>
+
+       <li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
+       labeled <i>attributes</i>.</p>
+
+       <li><p>Add the attribute's name to <var title="">attribute
+       list</var>.</p>
+
+       <li>
+
+        <p>Run the appropriate step from the following list, if one
+        applies:</p>
+
+        <dl class=switch><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>
+
+         <dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
+         pragma</var> to true.</dd>
+
+         <dt>If the attribute's name is "<code title="">content</code>"</dt>
+
+         <dd><p>Apply the <a href=#algorithm-for-extracting-an-encoding-from-a-meta-element>algorithm for extracting an encoding
+         from a <code>meta</code> element</a>, giving the
+         attribute's value as the string to parse. If an encoding is
+         returned, and if <var title="">charset</var> is still set
+         to null, let <var title="">charset</var> be the encoding
+         returned, and set <var title="">need pragma</var> to
+         true.</dd>
+
+         <dt>If the attribute's name is "<code title="">charset</code>"</dt>
+
+         <dd><p>Let <var title="">charset</var> be the encoding
+         corresponding to the attribute's value, and set <var title="">need pragma</var> to false.</dd>
+
+        </dl></li>
+
+       <li><p>Return to the step labeled <i>attributes</i>.</li>
+
+       <li><p><i>Processing</i>: If <var title="">need pragma</var> is
+       null, then jump to the step below labeled <i>next
+       byte</i>.</li>
+
+       <li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is false, then jump to the step below
+       labeled <i>next byte</i>.</li>
+
+       <li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16
+       encoding</a>, change the value of <var title="">charset</var> to UTF-8.</li>
+
+       <li><p>If <var title="">charset</var> is not a supported
+       character encoding, then jump to the step below labeled <i>next
+       byte</i>.</li>
+
+       <li><p>Abort the <a href=#prescan-a-byte-stream-to-determine-its-encoding>prescan a byte stream to determine its
+       encoding</a> algorithm, returning the encoding given by <var title="">charset</var>.</li>
+
+      </ol></dd>
+
+     <dt>A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>
+     <dd>
+
+      <ol><li><p>Advance the <var title="">position</var> pointer so
+       that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF),
+       0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
+       (ASCII >) byte.</li>
+
+       <li><p>Repeatedly <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+       attribute</a> until no further attributes can be found, then
+       jump to the step below labeled <i>next byte</i>.</li>
+
+      </ol></dd>
+
+     <dt>A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!')</dt>
+     <dt>A sequence of bytes starting with: 0x3C 0x2F (ASCII '</')</dt>
+     <dt>A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')</dt>
+     <dd>
+
+      <p>Advance the <var title="">position</var> pointer so that it
+      points at the first 0x3E byte (ASCII >) that comes after the
+      0x3C byte that was found.</p>
+
+     </dd>
+
+     <dt>Any other byte</dt>
+     <dd>
+
+      <p>Do nothing with that byte.</p>
+
+     </dd>
+
+    </dl></li>
+
+   <li><i>Next byte</i>: Move <var title="">position</var> so it
+   points at the next byte in the input stream, and return to the step
+   above labeld <i>loop</i>.</li>
+
+  </ol><p>When the <a href=#prescan-a-byte-stream-to-determine-its-encoding>prescan a byte stream to determine its
+  encoding</a> algorithm says to <dfn id=concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an attribute</dfn>,
+  it means doing this:</p>
+
+  <ol><li><p>If the byte at <var title="">position</var> is one of 0x09
+   (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR),
+   0x20 (ASCII space), or 0x2F (ASCII /) then advance <var title="">position</var> to the next byte and redo this
+   step.</li>
+
+   <li><p>If the byte at <var title="">position</var> is 0x3E (ASCII
+   >), then abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+   attribute</a> algorithm. There isn't one.</li>
+
+   <li><p>Otherwise, the byte at <var title="">position</var> is the
+   start of the attribute name. Let <var title="">attribute name</var>
+   and <var title="">attribute value</var> be the empty
+   string.</li>
+
+   <li><p><i>Attribute name</i>: Process the byte at <var title="">position</var> as follows:</p>
+
+    <dl class=switch><dt>If it is 0x3D (ASCII =), and the <var title="">attribute
+     name</var> is longer than the empty string</dt>
+
+     <dd>Advance <var title="">position</var> to the next byte and
+     jump to the step below labeled <i>value</i>.</dd>
+
+     <dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
+     FF), 0x0D (ASCII CR), or 0x20 (ASCII space)</dt>
+
+     <dd>Jump to the step below labeled <i>spaces</i>.</dd>
+
+     <dt>If it is 0x2F (ASCII /) or 0x3E (ASCII >)</dt>
+
+     <dd>Abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+     attribute</a> algorithm. The attribute's name is the value of
+     <var title="">attribute name</var>, its value is the empty
+     string.</dd>
+
+     <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
+     Z)</dt>
+
+     <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute name</var> (where <var title="">b</var> is
+     the value of the byte at <var title="">position</var>). (This
+     converts the input to lowercase.)</dd>
+
+     <dt>Anything else</dt>
+
+     <dd>Append the Unicode character with the same code point as the
+     value of the byte at <var title="">position</var>) to <var title="">attribute name</var>. (It doesn't actually matter how
+     bytes outside the ASCII range are handled here, since only
+     ASCII characters can contribute to the detection of a character
+     encoding.)</dd>
+
+    </dl></li>
+
+   <li><p>Advance <var title="">position</var> to the next byte and
+   return to the previous step.</li>
+
+   <li><p><i>Spaces</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
+   LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
+   advance <var title="">position</var> to the next byte, then,
+   repeat this step.</li>
+
+   <li><p>If the byte at <var title="">position</var> is <em>not</em>
+   0x3D (ASCII =), abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+   attribute</a> algorithm. The attribute's name is the value of
+   <var title="">attribute name</var>, its value is the empty
+   string.</li>
+
+   <li><p>Advance <var title="">position</var> past the 0x3D (ASCII
+   =) byte.</li>
+
+   <li><p><i>Value</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
+   LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
+   advance <var title="">position</var> to the next byte, then,
+   repeat this step.</li>
+
+   <li><p>Process the byte at <var title="">position</var> as
+   follows:</p>
+
+    <dl class=switch><dt>If it is 0x22 (ASCII ") or 0x27 (ASCII ')</dt>
+
+     <dd>
+
+      <ol><li>Let <var title="">b</var> be the value of the byte at
+       <var title="">position</var>.</li>
+
+       <li><i>Quote loop</i>: Advance <var title="">position</var> to
+       the next byte.</li>
+
+       <li>If the value of the byte at <var title="">position</var> is
+       the value of <var title="">b</var>, then advance <var title="">position</var> to the next byte and abort the "get an
+       attribute" algorithm. The attribute's name is the value of <var title="">attribute name</var>, and its value is the value of
+       <var title="">attribute value</var>.</li>
+
+       <li>Otherwise, if the value of the byte at <var title="">position</var> is in the range 0x41 (ASCII A) to 0x5A
+       (ASCII Z), then append a Unicode character to <var title="">attribute value</var> whose code point is 0x20 more
+       than the value of the byte at <var title="">position</var>.</li>
+
+       <li>Otherwise, append a Unicode character to <var title="">attribute value</var> whose code point is the same as
+       the value of the byte at <var title="">position</var>.</li>
+
+       <li>Return to the step above labeled <i>quote loop</i>.</li>
+
+      </ol></dd>
+
+     <dt>If it is 0x3E (ASCII >)</dt>
+
+     <dd>Abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+     attribute</a> algorithm. The attribute's name is the value of
+     <var title="">attribute name</var>, its value is the empty
+     string.</dd>
+
+
+     <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
+     Z)</dt>
+
+     <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute value</var> (where <var title="">b</var> is
+     the value of the byte at <var title="">position</var>). Advance
+     <var title="">position</var> to the next byte.</dd>
+
+     <dt>Anything else</dt>
+
+     <dd>Append the Unicode character with the same code point as the
+     value of the byte at <var title="">position</var>) to <var title="">attribute value</var>. Advance <var title="">position</var> to the next byte.</dd>
+
+    </dl></li>
+
+   <li><p>Process the byte at <var title="">position</var> as
+   follows:</p>
+
+    <dl class=switch><dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
+     FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII
+     >)</dt>
+
+     <dd>Abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+     attribute</a> algorithm. The attribute's name is the value of
+     <var title="">attribute name</var> and its value is the value of
+     <var title="">attribute value</var>.</dd>
+
+     <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII Z)</dt>
+
+     <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute value</var> (where <var title="">b</var> is
+     the value of the byte at <var title="">position</var>).</dd>
+
+     <dt>Anything else</dt>
+
+     <dd>Append the Unicode character with the same code point as the
+     value of the byte at <var title="">position</var>) to <var title="">attribute value</var>.</dd>
+
+    </dl></li>
+
+   <li><p>Advance <var title="">position</var> to the next byte and
+   return to the previous step.</li>
+
+  </ol><p>For the sake of interoperability, user agents should not use a
+  pre-scan algorithm that returns different results than the one
+  described above. (But, if you do, please at least let us know, so
+  that we can improve this algorithm and benefit everyone...)</p>
+
 <!--(removed this since the specs are being changed)
-  <p class="note">This algorithm is a <span>willful violation</span>
-  of the HTTP specification, which requires that the encoding be
-  assumed to be ISO-8859-1 in the absence of a <span>character
-  encoding declaration</span> to the contrary, and of RFC 2046,
-  which requires that the encoding be assumed to be US-ASCII in the
-  absence of a <span>character encoding declaration</span> to the
-  contrary. This specification's third approach is motivated by a
+  <p class="note">These algorithms are a <span>willful
+  violation</span> of the HTTP specification, which requires that the
+  encoding be assumed to be ISO-8859-1 in the absence of a
+  <span>character encoding declaration</span> to the contrary, and of
+  RFC 2046, which requires that the encoding be assumed to be US-ASCII
+  in the absence of a <span>character encoding declaration</span> to
+  the contrary. This specification's third approach is motivated by a
   desire to be maximally compatible with legacy content. <a
   href="#refsHTTP">[HTTP]</a> <a href="#refsRFC2046">[RFC2046]</a></p>
 -->
 
+
+
   <h5 id=character-encodings-0><span class=secno>12.2.2.2 </span>Character encodings</h5>
 
   <p>User agents must at a minimum support the UTF-8 and Windows-1252

Modified: index
===================================================================
--- index	2012-02-11 18:45:11 UTC (rev 6989)
+++ index	2012-02-13 21:06:58 UTC (rev 6990)
@@ -240,7 +240,7 @@
 
   <header class=head id=head><p><a class=logo href=http://www.whatwg.org/><img alt=WHATWG height=101 src=/images/logo width=101></a></p>
    <hgroup><h1 class=allcaps>HTML</h1>
-    <h2 class="no-num no-toc">Living Standard — Last Updated 11 February 2012</h2>
+    <h2 class="no-num no-toc">Living Standard — Last Updated 13 February 2012</h2>
    </hgroup><dl><dt><strong>Web developer edition:</strong></dt>
     <dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
     <dt>Multiple-page version:</dt>
@@ -81188,10 +81188,10 @@
   parse of the document with the real encoding.</p>
 
   <p id=documentEncoding>User agents must use the following
-  algorithm (the <dfn id=encoding-sniffing-algorithm>encoding sniffing algorithm</dfn>) to determine
-  the character encoding to use when decoding a document in the first
-  pass. This algorithm takes as input any out-of-band metadata
-  available to the user agent (e.g. the <a href=#content-type title=Content-Type>Content-Type metadata</a> of the document)
+  algorithm, called the <dfn id=encoding-sniffing-algorithm>encoding sniffing algorithm</dfn>, to
+  determine the character encoding to use when decoding a document in
+  the first pass. This algorithm takes as input any out-of-band
+  metadata available to the user agent (e.g. the <a href=#content-type title=Content-Type>Content-Type metadata</a> of the document)
   and all the bytes available so far, and returns an encoding and a
   <dfn id=concept-encoding-confidence title=concept-encoding-confidence>confidence</dfn>. The
   confidence is either <i>tentative</i>, <i>certain</i>, or
@@ -81227,9 +81227,9 @@
 
     <p class=note>The authoring conformance requirements for
     character encoding declarations limit them to only appearing <a href=#charset1024>in the first 1024 bytes</a>. User agents are
-    therefore encouraged to use the preparse algorithm below (part of
-    these steps) on the first 1024 bytes, but not to stall beyond
-    that.</p>
+    therefore encouraged to use the prescan algorithm below (as
+    invoked by these steps) on the first 1024 bytes, but not to stall
+    beyond that.</p>
 
    </li>
 
@@ -81265,317 +81265,28 @@
     </table><p class=note>This step looks for Unicode Byte Order Marks
    (BOMs).</li>
 
-   <li><p>Otherwise, the user agent will have to search for explicit
-   character encoding information in the file itself. This should
-   proceed as follows:
+   <li>
 
-    <p>Let <var title="">position</var> be a pointer to a byte in the
-    input stream, initially pointing at the first byte. If at any
-    point during these substeps the user agent either runs out of
-    bytes or decides that scanning further bytes would not be
-    efficient, then skip to the next step of the overall character
-    encoding detection algorithm. User agents may decide that scanning
-    <em>any</em> bytes is not efficient, in which case these substeps
-    are entirely skipped.</p>
+    <p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
+    determine its encoding">prescan the byte stream to determine its
+    encoding</a>. The <var title="">end condition</var> is that the
+    user agent decides that scanning further bytes would not be
+    efficient. User agents are encouraged to only prescan the first
+    1024 bytes. User agents may decide that scanning <em>any</em>
+    bytes is not efficient, in which case these substeps are entirely
+    skipped.</p>
 
-    <p>Now, repeat the following "two" steps until the algorithm
-    aborts (either because user agent aborts, as described above, or
-    because a character encoding is found):</p>
+    <p>The aforementioned algorithm either aborts unsuccessfully or
+    returns a character encoding. If it returns a character encoding,
+    then this algorithm must be aborted, returning the same encoding,
+    with <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+    <i>tentative</i>.</p>
 
-    <ol><li><p>If <var title="">position</var> points to:</p>
-
-      <dl class=switch><dt>A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--')</dt>
-       <dd>
-
-        <p>Advance the <var title="">position</var> pointer so that it
-        points at the first 0x3E byte which is preceded by two 0x2D
-        bytes (i.e. at the end of an ASCII '-->' sequence) and comes
-        after the 0x3C byte that was found. (The two 0x2D bytes can be
-        the same as the those in the '<!--' sequence.)</p>
-
-       </dd>
-
-       <dt>A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '<meta' followed by a space or slash)</dt>
-       <dd>
-
-        <ol><li><p>Advance the <var title="">position</var> pointer so
-         that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or
-         0x2F byte (the one in sequence of characters matched
-         above).</li>
-
-         <li><p>Let <var title="">attribute list</var> be an empty
-         list of strings.</li> <!-- so long as we only care about
-         http-equiv, content, and charset, this can be a 3-bit
-         bitfield -->
-
-         <li><p>Let <var title="">got pragma</var> be false.</li>
-
-         <li><p>Let <var title="">need pragma</var> be null.</li>
-
-         <li><p>Let <var title="">charset</var> be the null value
-         (which, for the purposes of this algorithm, is distinct from
-         an unrecognised encoding or the empty string).</li>
-
-         <li><p><i>Attributes</i>: <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get an
-         attribute</a> and its value. If no attribute was sniffed,
-         then jump to the <i>processing</i> step below.</li>
-
-         <li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
-         labeled <i>attributes</i>.</p>
-
-         <li><p>Add the attribute's name to <var title="">attribute
-         list</var>.</p>
-
-         <li>
-
-          <p>Run the appropriate step from the following list, if one
-          applies:</p>
-
-          <dl class=switch><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>
-
-           <dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
-           pragma</var> to true.</dd>
-
-           <dt>If the attribute's name is "<code title="">content</code>"</dt>
-
-           <dd><p>Apply the <a href=#algorithm-for-extracting-an-encoding-from-a-meta-element>algorithm for extracting an encoding
-           from a <code>meta</code> element</a>, giving the
-           attribute's value as the string to parse. If an encoding is
-           returned, and if <var title="">charset</var> is still set
-           to null, let <var title="">charset</var> be the encoding
-           returned, and set <var title="">need pragma</var> to
-           true.</dd>
-
-           <dt>If the attribute's name is "<code title="">charset</code>"</dt>
-
-           <dd><p>Let <var title="">charset</var> be the encoding
-           corresponding to the attribute's value, and set <var title="">need pragma</var> to false.</dd>
-
-          </dl></li>
-
-         <li><p>Return to the step labeled <i>attributes</i>.</li>
-
-         <li><p><i>Processing</i>: If <var title="">need pragma</var>
-         is null, then jump to the second step of the overall "two
-         step" algorithm.</li>
-
-         <li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is false, then jump to the second
-         step of the overall "two step" algorithm.</li>
-
-         <li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16
-         encoding</a>, change the value of <var title="">charset</var> to UTF-8.</li>
-
-         <li><p>If <var title="">charset</var> is not a supported
-         character encoding, then jump to the second step of the
-         overall "two step" algorithm.</li>
-
-         <li><p>Return the encoding given by <var title="">charset</var>, with <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
-         <i>tentative</i>, and abort all these steps.</li>
-
-        </ol></dd>
-
-       <dt>A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>
-       <dd>
-
-        <ol><li><p>Advance the <var title="">position</var> pointer so
-         that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF),
-         0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
-         (ASCII >) byte.</li>
-
-         <li><p>Repeatedly <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
-         attribute</a> until no further attributes can be found,
-         then jump to the second step in the overall "two step"
-         algorithm.</li>
-
-        </ol></dd>
-
-       <dt>A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!')</dt>
-       <dt>A sequence of bytes starting with: 0x3C 0x2F (ASCII '</')</dt>
-       <dt>A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')</dt>
-       <dd>
-
-        <p>Advance the <var title="">position</var> pointer so that it
-        points at the first 0x3E byte (ASCII >) that comes after the
-        0x3C byte that was found.</p>
-
-       </dd>
-
-       <dt>Any other byte</dt>
-       <dd>
-
-        <p>Do nothing with that byte.</p>
-
-       </dd>
-
-      </dl></li>
-
-     <li>Move <var title="">position</var> so it points at the next
-     byte in the input stream, and return to the first step of this
-     "two step" algorithm.</li>
-
-    </ol><p>When the above "two step" algorithm says to <dfn id=concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
-    attribute</dfn>, it means doing this:</p>
-
-    <ol><li><p>If the byte at <var title="">position</var> is one of 0x09
-     (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR),
-     0x20 (ASCII space), or 0x2F (ASCII /) then advance <var title="">position</var> to the next byte and redo this
-     substep.</li>
-
-     <li><p>If the byte at <var title="">position</var> is 0x3E (ASCII
-     >), then abort the "get an attribute" algorithm. There isn't
-     one.</li>
-
-     <li><p>Otherwise, the byte at <var title="">position</var> is the
-     start of the attribute name. Let <var title="">attribute
-     name</var> and <var title="">attribute value</var> be the empty
-     string.</li>
-
-     <li><p><i>Attribute name</i>: Process the byte at <var title="">position</var> as follows:</p>
-
-      <dl class=switch><dt>If it is 0x3D (ASCII =), and the <var title="">attribute
-       name</var> is longer than the empty string</dt>
-
-       <dd>Advance <var title="">position</var> to the next byte and
-       jump to the step below labeled <i>value</i>.</dd>
-
-       <dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
-       FF), 0x0D (ASCII CR), or 0x20 (ASCII space)</dt>
-
-       <dd>Jump to the step below labeled <i>spaces</i>.</dd>
-
-       <dt>If it is 0x2F (ASCII /) or 0x3E (ASCII >)</dt>
-
-       <dd>Abort the "get an attribute" algorithm. The attribute's
-       name is the value of <var title="">attribute name</var>, its
-       value is the empty string.</dd>
-
-       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
-       Z)</dt>
-
-       <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute name</var> (where <var title="">b</var> is
-       the value of the byte at <var title="">position</var>). (This
-       converts the input to lowercase.)</dd>
-
-       <dt>Anything else</dt>
-
-       <dd>Append the Unicode character with the same code point as the
-       value of the byte at <var title="">position</var>) to <var title="">attribute name</var>. (It doesn't actually matter how
-       bytes outside the ASCII range are handled here, since only
-       ASCII characters can contribute to the detection of a character
-       encoding.)</dd>
-
-      </dl></li>
-
-     <li><p>Advance <var title="">position</var> to the next byte and
-     return to the previous step.</li>
-
-     <li><p><i>Spaces</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
-     LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
-     advance <var title="">position</var> to the next byte, then,
-     repeat this step.</li>
-
-     <li><p>If the byte at <var title="">position</var> is
-     <em>not</em> 0x3D (ASCII =), abort the "get an attribute"
-     algorithm. The attribute's name is the value of <var title="">attribute name</var>, its value is the empty
-     string.</li>
-
-     <li><p>Advance <var title="">position</var> past the 0x3D (ASCII
-     =) byte.</li>
-
-     <li><p><i>Value</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
-     LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
-     advance <var title="">position</var> to the next byte, then,
-     repeat this step.</li>
-
-     <li><p>Process the byte at <var title="">position</var> as
-     follows:</p>
-
-      <dl class=switch><dt>If it is 0x22 (ASCII ") or 0x27 (ASCII ')</dt>
-
-       <dd>
-
-        <ol><li>Let <var title="">b</var> be the value of the byte at
-         <var title="">position</var>.</li>
-
-         <li>Advance <var title="">position</var> to the next
-         byte.</li>
-
-         <li>If the value of the byte at <var title="">position</var>
-         is the value of <var title="">b</var>, then advance <var title="">position</var> to the next byte and abort the "get
-         an attribute" algorithm. The attribute's name is the value of
-         <var title="">attribute name</var>, and its value is the
-         value of <var title="">attribute value</var>.</li>
-
-         <li>Otherwise, if the value of the byte at <var title="">position</var> is in the range 0x41 (ASCII A) to
-         0x5A (ASCII Z), then append a Unicode character to <var title="">attribute value</var> whose code point is 0x20 more
-         than the value of the byte at <var title="">position</var>.</li>
-
-         <li>Otherwise, append a Unicode character to <var title="">attribute value</var> whose code point is the same as
-         the value of the byte at <var title="">position</var>.</li>
-
-         <li>Return to the second step in these substeps.</li>
-
-        </ol></dd>
-
-       <dt>If it is 0x3E (ASCII >)</dt>
-
-       <dd>Abort the "get an attribute" algorithm. The attribute's
-       name is the value of <var title="">attribute name</var>, its
-       value is the empty string.</dd>
-
-
-       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
-       Z)</dt>
-
-       <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
-       value</var> (where <var title="">b</var> is the value of the
-       byte at <var title="">position</var>). Advance <var title="">position</var> to the next byte.</dd>
-
-       <dt>Anything else</dt>
-
-       <dd>Append the Unicode character with the same code point as the
-       value of the byte at <var title="">position</var>) to <var title="">attribute value</var>. Advance <var title="">position</var> to the next byte.</dd>
-
-      </dl></li>
-
-     <li><p>Process the byte at <var title="">position</var> as
-     follows:</p>
-
-      <dl class=switch><dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
-       FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII
-       >)</dt>
-
-       <dd>Abort the "get an attribute" algorithm. The attribute's
-       name is the value of <var title="">attribute name</var> and its
-       value is the value of <var title="">attribute value</var>.</dd>
-
-       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
-       Z)</dt>
-
-       <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
-       value</var> (where <var title="">b</var> is the value of the
-       byte at <var title="">position</var>).</dd>
-
-       <dt>Anything else</dt>
-
-       <dd>Append the Unicode character with the same code point as the
-       value of the byte at <var title="">position</var>) to <var title="">attribute value</var>.</dd>
-
-      </dl></li>
-
-     <li><p>Advance <var title="">position</var> to the next byte and
-     return to the previous step.</li>
-
-    </ol><p>For the sake of interoperability, user agents should not use a
-    pre-scan algorithm that returns different results than the one
-    described above. (But, if you do, please at least let us know, so
-    that we can improve this algorithm and benefit everyone...)</p>
-
    </li>
 
-   <li><p>If the user agent has information on the likely encoding for
-   this page, e.g. based on the encoding of the page when it was last
-   visited, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
+   <li><p>Otherwise, if the user agent has information on the likely
+   encoding for this page, e.g. based on the encoding of the page when
+   it was last visited, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
    <i>tentative</i>, and abort these steps.</li>
 
    <li>
@@ -81719,18 +81430,328 @@
   as the user agent uses the returned value to select the decoder to
   use for the input stream.</p>
 
+  <hr><p>When an algorithm requires a user agent to <dfn id=prescan-a-byte-stream-to-determine-its-encoding>prescan a byte
+  stream to determine its encoding</dfn>, given some defined <var title="">end condition</var>, then it must run the following steps.
+  These steps either abort unsuccessfully or return a character
+  encoding.</p>
+
+  <ol><li>
+
+    <p>Let <var title="">position</var> be a pointer to a byte in the
+    input stream, initially pointing at the first byte. If at any
+    point during these steps the user agent either runs out of bytes
+    or reaches its <var title="">end condition</var>, then abort the
+    <a href=#prescan-a-byte-stream-to-determine-its-encoding>prescan a byte stream to determine its encoding</a>
+    algorithm unsuccessfully.</p>
+
+   </li>
+
+   <li>
+
+    <p><i>Loop</i>: If <var title="">position</var> points to:</p>
+
+    <dl class=switch><dt>A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--')</dt>
+     <dd>
+
+      <p>Advance the <var title="">position</var> pointer so that it
+      points at the first 0x3E byte which is preceded by two 0x2D
+      bytes (i.e. at the end of an ASCII '-->' sequence) and comes
+      after the 0x3C byte that was found. (The two 0x2D bytes can be
+      the same as the those in the '<!--' sequence.)</p>
+
+     </dd>
+
+     <dt>A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '<meta' followed by a space or slash)</dt>
+     <dd>
+
+      <ol><li><p>Advance the <var title="">position</var> pointer so
+       that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or
+       0x2F byte (the one in sequence of characters matched
+       above).</li>
+
+       <li><p>Let <var title="">attribute list</var> be an empty
+       list of strings.</li> <!-- so long as we only care about
+       http-equiv, content, and charset, this can be a 3-bit
+       bitfield -->
+
+       <li><p>Let <var title="">got pragma</var> be false.</li>
+
+       <li><p>Let <var title="">need pragma</var> be null.</li>
+
+       <li><p>Let <var title="">charset</var> be the null value
+       (which, for the purposes of this algorithm, is distinct from
+       an unrecognised encoding or the empty string).</li>
+
+       <li><p><i>Attributes</i>: <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get an
+       attribute</a> and its value. If no attribute was sniffed,
+       then jump to the <i>processing</i> step below.</li>
+
+       <li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
+       labeled <i>attributes</i>.</p>
+
+       <li><p>Add the attribute's name to <var title="">attribute
+       list</var>.</p>
+
+       <li>
+
+        <p>Run the appropriate step from the following list, if one
+        applies:</p>
+
+        <dl class=switch><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>
+
+         <dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
+         pragma</var> to true.</dd>
+
+         <dt>If the attribute's name is "<code title="">content</code>"</dt>
+
+         <dd><p>Apply the <a href=#algorithm-for-extracting-an-encoding-from-a-meta-element>algorithm for extracting an encoding
+         from a <code>meta</code> element</a>, giving the
+         attribute's value as the string to parse. If an encoding is
+         returned, and if <var title="">charset</var> is still set
+         to null, let <var title="">charset</var> be the encoding
+         returned, and set <var title="">need pragma</var> to
+         true.</dd>
+
+         <dt>If the attribute's name is "<code title="">charset</code>"</dt>
+
+         <dd><p>Let <var title="">charset</var> be the encoding
+         corresponding to the attribute's value, and set <var title="">need pragma</var> to false.</dd>
+
+        </dl></li>
+
+       <li><p>Return to the step labeled <i>attributes</i>.</li>
+
+       <li><p><i>Processing</i>: If <var title="">need pragma</var> is
+       null, then jump to the step below labeled <i>next
+       byte</i>.</li>
+
+       <li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is false, then jump to the step below
+       labeled <i>next byte</i>.</li>
+
+       <li><p>If <var title="">charset</var> is <a href=#a-utf-16-encoding>a UTF-16
+       encoding</a>, change the value of <var title="">charset</var> to UTF-8.</li>
+
+       <li><p>If <var title="">charset</var> is not a supported
+       character encoding, then jump to the step below labeled <i>next
+       byte</i>.</li>
+
+       <li><p>Abort the <a href=#prescan-a-byte-stream-to-determine-its-encoding>prescan a byte stream to determine its
+       encoding</a> algorithm, returning the encoding given by <var title="">charset</var>.</li>
+
+      </ol></dd>
+
+     <dt>A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>
+     <dd>
+
+      <ol><li><p>Advance the <var title="">position</var> pointer so
+       that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF),
+       0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
+       (ASCII >) byte.</li>
+
+       <li><p>Repeatedly <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+       attribute</a> until no further attributes can be found, then
+       jump to the step below labeled <i>next byte</i>.</li>
+
+      </ol></dd>
+
+     <dt>A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!')</dt>
+     <dt>A sequence of bytes starting with: 0x3C 0x2F (ASCII '</')</dt>
+     <dt>A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')</dt>
+     <dd>
+
+      <p>Advance the <var title="">position</var> pointer so that it
+      points at the first 0x3E byte (ASCII >) that comes after the
+      0x3C byte that was found.</p>
+
+     </dd>
+
+     <dt>Any other byte</dt>
+     <dd>
+
+      <p>Do nothing with that byte.</p>
+
+     </dd>
+
+    </dl></li>
+
+   <li><i>Next byte</i>: Move <var title="">position</var> so it
+   points at the next byte in the input stream, and return to the step
+   above labeld <i>loop</i>.</li>
+
+  </ol><p>When the <a href=#prescan-a-byte-stream-to-determine-its-encoding>prescan a byte stream to determine its
+  encoding</a> algorithm says to <dfn id=concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an attribute</dfn>,
+  it means doing this:</p>
+
+  <ol><li><p>If the byte at <var title="">position</var> is one of 0x09
+   (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR),
+   0x20 (ASCII space), or 0x2F (ASCII /) then advance <var title="">position</var> to the next byte and redo this
+   step.</li>
+
+   <li><p>If the byte at <var title="">position</var> is 0x3E (ASCII
+   >), then abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+   attribute</a> algorithm. There isn't one.</li>
+
+   <li><p>Otherwise, the byte at <var title="">position</var> is the
+   start of the attribute name. Let <var title="">attribute name</var>
+   and <var title="">attribute value</var> be the empty
+   string.</li>
+
+   <li><p><i>Attribute name</i>: Process the byte at <var title="">position</var> as follows:</p>
+
+    <dl class=switch><dt>If it is 0x3D (ASCII =), and the <var title="">attribute
+     name</var> is longer than the empty string</dt>
+
+     <dd>Advance <var title="">position</var> to the next byte and
+     jump to the step below labeled <i>value</i>.</dd>
+
+     <dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
+     FF), 0x0D (ASCII CR), or 0x20 (ASCII space)</dt>
+
+     <dd>Jump to the step below labeled <i>spaces</i>.</dd>
+
+     <dt>If it is 0x2F (ASCII /) or 0x3E (ASCII >)</dt>
+
+     <dd>Abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+     attribute</a> algorithm. The attribute's name is the value of
+     <var title="">attribute name</var>, its value is the empty
+     string.</dd>
+
+     <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
+     Z)</dt>
+
+     <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute name</var> (where <var title="">b</var> is
+     the value of the byte at <var title="">position</var>). (This
+     converts the input to lowercase.)</dd>
+
+     <dt>Anything else</dt>
+
+     <dd>Append the Unicode character with the same code point as the
+     value of the byte at <var title="">position</var>) to <var title="">attribute name</var>. (It doesn't actually matter how
+     bytes outside the ASCII range are handled here, since only
+     ASCII characters can contribute to the detection of a character
+     encoding.)</dd>
+
+    </dl></li>
+
+   <li><p>Advance <var title="">position</var> to the next byte and
+   return to the previous step.</li>
+
+   <li><p><i>Spaces</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
+   LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
+   advance <var title="">position</var> to the next byte, then,
+   repeat this step.</li>
+
+   <li><p>If the byte at <var title="">position</var> is <em>not</em>
+   0x3D (ASCII =), abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+   attribute</a> algorithm. The attribute's name is the value of
+   <var title="">attribute name</var>, its value is the empty
+   string.</li>
+
+   <li><p>Advance <var title="">position</var> past the 0x3D (ASCII
+   =) byte.</li>
+
+   <li><p><i>Value</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
+   LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
+   advance <var title="">position</var> to the next byte, then,
+   repeat this step.</li>
+
+   <li><p>Process the byte at <var title="">position</var> as
+   follows:</p>
+
+    <dl class=switch><dt>If it is 0x22 (ASCII ") or 0x27 (ASCII ')</dt>
+
+     <dd>
+
+      <ol><li>Let <var title="">b</var> be the value of the byte at
+       <var title="">position</var>.</li>
+
+       <li><i>Quote loop</i>: Advance <var title="">position</var> to
+       the next byte.</li>
+
+       <li>If the value of the byte at <var title="">position</var> is
+       the value of <var title="">b</var>, then advance <var title="">position</var> to the next byte and abort the "get an
+       attribute" algorithm. The attribute's name is the value of <var title="">attribute name</var>, and its value is the value of
+       <var title="">attribute value</var>.</li>
+
+       <li>Otherwise, if the value of the byte at <var title="">position</var> is in the range 0x41 (ASCII A) to 0x5A
+       (ASCII Z), then append a Unicode character to <var title="">attribute value</var> whose code point is 0x20 more
+       than the value of the byte at <var title="">position</var>.</li>
+
+       <li>Otherwise, append a Unicode character to <var title="">attribute value</var> whose code point is the same as
+       the value of the byte at <var title="">position</var>.</li>
+
+       <li>Return to the step above labeled <i>quote loop</i>.</li>
+
+      </ol></dd>
+
+     <dt>If it is 0x3E (ASCII >)</dt>
+
+     <dd>Abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+     attribute</a> algorithm. The attribute's name is the value of
+     <var title="">attribute name</var>, its value is the empty
+     string.</dd>
+
+
+     <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
+     Z)</dt>
+
+     <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute value</var> (where <var title="">b</var> is
+     the value of the byte at <var title="">position</var>). Advance
+     <var title="">position</var> to the next byte.</dd>
+
+     <dt>Anything else</dt>
+
+     <dd>Append the Unicode character with the same code point as the
+     value of the byte at <var title="">position</var>) to <var title="">attribute value</var>. Advance <var title="">position</var> to the next byte.</dd>
+
+    </dl></li>
+
+   <li><p>Process the byte at <var title="">position</var> as
+   follows:</p>
+
+    <dl class=switch><dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
+     FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII
+     >)</dt>
+
+     <dd>Abort the <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>get an
+     attribute</a> algorithm. The attribute's name is the value of
+     <var title="">attribute name</var> and its value is the value of
+     <var title="">attribute value</var>.</dd>
+
+     <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII Z)</dt>
+
+     <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute value</var> (where <var title="">b</var> is
+     the value of the byte at <var title="">position</var>).</dd>
+
+     <dt>Anything else</dt>
+
+     <dd>Append the Unicode character with the same code point as the
+     value of the byte at <var title="">position</var>) to <var title="">attribute value</var>.</dd>
+
+    </dl></li>
+
+   <li><p>Advance <var title="">position</var> to the next byte and
+   return to the previous step.</li>
+
+  </ol><p>For the sake of interoperability, user agents should not use a
+  pre-scan algorithm that returns different results than the one
+  described above. (But, if you do, please at least let us know, so
+  that we can improve this algorithm and benefit everyone...)</p>
+
 <!--(removed this since the specs are being changed)
-  <p class="note">This algorithm is a <span>willful violation</span>
-  of the HTTP specification, which requires that the encoding be
-  assumed to be ISO-8859-1 in the absence of a <span>character
-  encoding declaration</span> to the contrary, and of RFC 2046,
-  which requires that the encoding be assumed to be US-ASCII in the
-  absence of a <span>character encoding declaration</span> to the
-  contrary. This specification's third approach is motivated by a
+  <p class="note">These algorithms are a <span>willful
+  violation</span> of the HTTP specification, which requires that the
+  encoding be assumed to be ISO-8859-1 in the absence of a
+  <span>character encoding declaration</span> to the contrary, and of
+  RFC 2046, which requires that the encoding be assumed to be US-ASCII
+  in the absence of a <span>character encoding declaration</span> to
+  the contrary. This specification's third approach is motivated by a
   desire to be maximally compatible with legacy content. <a
   href="#refsHTTP">[HTTP]</a> <a href="#refsRFC2046">[RFC2046]</a></p>
 -->
 
+
+
   <h5 id=character-encodings-0><span class=secno>12.2.2.2 </span>Character encodings</h5>
 
   <p>User agents must at a minimum support the UTF-8 and Windows-1252

Modified: source
===================================================================
--- source	2012-02-11 18:45:11 UTC (rev 6989)
+++ source	2012-02-13 21:06:58 UTC (rev 6990)
@@ -94148,10 +94148,10 @@
   parse of the document with the real encoding.</p>
 
   <p id="documentEncoding">User agents must use the following
-  algorithm (the <dfn>encoding sniffing algorithm</dfn>) to determine
-  the character encoding to use when decoding a document in the first
-  pass. This algorithm takes as input any out-of-band metadata
-  available to the user agent (e.g. the <span
+  algorithm, called the <dfn>encoding sniffing algorithm</dfn>, to
+  determine the character encoding to use when decoding a document in
+  the first pass. This algorithm takes as input any out-of-band
+  metadata available to the user agent (e.g. the <span
   title="Content-Type">Content-Type metadata</span> of the document)
   and all the bytes available so far, and returns an encoding and a
   <dfn title="concept-encoding-confidence">confidence</dfn>. The
@@ -94194,9 +94194,9 @@
     <p class="note">The authoring conformance requirements for
     character encoding declarations limit them to only appearing <a
     href="#charset1024">in the first 1024 bytes</a>. User agents are
-    therefore encouraged to use the preparse algorithm below (part of
-    these steps) on the first 1024 bytes, but not to stall beyond
-    that.</p>
+    therefore encouraged to use the prescan algorithm below (as
+    invoked by these steps) on the first 1024 bytes, but not to stall
+    beyond that.</p>
 
    </li>
 
@@ -94243,389 +94243,28 @@
    <p class="note">This step looks for Unicode Byte Order Marks
    (BOMs).</p></li>
 
-   <li><p>Otherwise, the user agent will have to search for explicit
-   character encoding information in the file itself. This should
-   proceed as follows:
+   <li>
 
-    <p>Let <var title="">position</var> be a pointer to a byte in the
-    input stream, initially pointing at the first byte. If at any
-    point during these substeps the user agent either runs out of
-    bytes or decides that scanning further bytes would not be
-    efficient, then skip to the next step of the overall character
-    encoding detection algorithm. User agents may decide that scanning
-    <em>any</em> bytes is not efficient, in which case these substeps
-    are entirely skipped.</p>
+    <p>Otherwise, optionally <span title="prescan a byte stream to
+    determine its encoding">prescan the byte stream to determine its
+    encoding</span>. The <var title="">end condition</var> is that the
+    user agent decides that scanning further bytes would not be
+    efficient. User agents are encouraged to only prescan the first
+    1024 bytes. User agents may decide that scanning <em>any</em>
+    bytes is not efficient, in which case these substeps are entirely
+    skipped.</p>
 
-    <p>Now, repeat the following "two" steps until the algorithm
-    aborts (either because user agent aborts, as described above, or
-    because a character encoding is found):</p>
+    <p>The aforementioned algorithm either aborts unsuccessfully or
+    returns a character encoding. If it returns a character encoding,
+    then this algorithm must be aborted, returning the same encoding,
+    with <span title="concept-encoding-confidence">confidence</span>
+    <i>tentative</i>.</p>
 
-    <ol>
-
-     <li><p>If <var title="">position</var> points to:</p>
-
-      <dl class="switch">
-
-       <dt>A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--')</dt>
-       <dd>
-
-        <p>Advance the <var title="">position</var> pointer so that it
-        points at the first 0x3E byte which is preceded by two 0x2D
-        bytes (i.e. at the end of an ASCII '-->' sequence) and comes
-        after the 0x3C byte that was found. (The two 0x2D bytes can be
-        the same as the those in the '<!--' sequence.)</p>
-
-       </dd>
-
-       <dt>A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '<meta' followed by a space or slash)</dt>
-       <dd>
-
-        <ol>
-
-         <li><p>Advance the <var title="">position</var> pointer so
-         that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or
-         0x2F byte (the one in sequence of characters matched
-         above).</p></li>
-
-         <li><p>Let <var title="">attribute list</var> be an empty
-         list of strings.</p></li> <!-- so long as we only care about
-         http-equiv, content, and charset, this can be a 3-bit
-         bitfield -->
-
-         <li><p>Let <var title="">got pragma</var> be false.</p></li>
-
-         <li><p>Let <var title="">need pragma</var> be null.</p></li>
-
-         <li><p>Let <var title="">charset</var> be the null value
-         (which, for the purposes of this algorithm, is distinct from
-         an unrecognised encoding or the empty string).</p></li>
-
-         <li><p><i>Attributes</i>: <span
-         title="concept-get-attributes-when-sniffing">Get an
-         attribute</span> and its value. If no attribute was sniffed,
-         then jump to the <i>processing</i> step below.</p></li>
-
-         <li><p>If the attribute's name is already in <var
-         title="">attribute list</var>, then return to the step
-         labeled <i>attributes</i>.</p>
-
-         <li><p>Add the attribute's name to <var title="">attribute
-         list</var>.</p>
-
-         <li>
-
-          <p>Run the appropriate step from the following list, if one
-          applies:</p>
-
-          <dl class="switch">
-
-           <dt>If the attribute's name is "<code
-           title="">http-equiv</code>"</dt>
-
-           <dd><p>If the attribute's value is "<code
-           title="">content-type</code>", then set <var title="">got
-           pragma</var> to true.</p></dd>
-
-           <dt>If the attribute's name is "<code
-           title="">content</code>"</dt>
-
-           <dd><p>Apply the <span>algorithm for extracting an encoding
-           from a <code>meta</code> element</span>, giving the
-           attribute's value as the string to parse. If an encoding is
-           returned, and if <var title="">charset</var> is still set
-           to null, let <var title="">charset</var> be the encoding
-           returned, and set <var title="">need pragma</var> to
-           true.</p></dd>
-
-           <dt>If the attribute's name is "<code
-           title="">charset</code>"</dt>
-
-           <dd><p>Let <var title="">charset</var> be the encoding
-           corresponding to the attribute's value, and set <var
-           title="">need pragma</var> to false.</p></dd>
-
-          </dl>
-
-         </li>
-
-         <li><p>Return to the step labeled <i>attributes</i>.</p></li>
-
-         <li><p><i>Processing</i>: If <var title="">need pragma</var>
-         is null, then jump to the second step of the overall "two
-         step" algorithm.</p></li>
-
-         <li><p>If <var title="">need pragma</var> is true but <var
-         title="">got pragma</var> is false, then jump to the second
-         step of the overall "two step" algorithm.</p></li>
-
-         <li><p>If <var title="">charset</var> is <span>a UTF-16
-         encoding</span>, change the value of <var
-         title="">charset</var> to UTF-8.</p></li>
-
-         <li><p>If <var title="">charset</var> is not a supported
-         character encoding, then jump to the second step of the
-         overall "two step" algorithm.</p></li>
-
-         <li><p>Return the encoding given by <var
-         title="">charset</var>, with <span
-         title="concept-encoding-confidence">confidence</span>
-         <i>tentative</i>, and abort all these steps.</p></li>
-
-        </ol>
-
-       </dd>
-
-       <dt>A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>
-       <dd>
-
-        <ol>
-
-         <li><p>Advance the <var title="">position</var> pointer so
-         that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF),
-         0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
-         (ASCII >) byte.</p></li>
-
-         <li><p>Repeatedly <span
-         title="concept-get-attributes-when-sniffing">get an
-         attribute</span> until no further attributes can be found,
-         then jump to the second step in the overall "two step"
-         algorithm.</p></li>
-
-        </ol>
-
-       </dd>
-
-       <dt>A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!')</dt>
-       <dt>A sequence of bytes starting with: 0x3C 0x2F (ASCII '</')</dt>
-       <dt>A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')</dt>
-       <dd>
-
-        <p>Advance the <var title="">position</var> pointer so that it
-        points at the first 0x3E byte (ASCII >) that comes after the
-        0x3C byte that was found.</p>
-
-       </dd>
-
-       <dt>Any other byte</dt>
-       <dd>
-
-        <p>Do nothing with that byte.</p>
-
-       </dd>
-
-      </dl>
-
-     </li>
-
-     <li>Move <var title="">position</var> so it points at the next
-     byte in the input stream, and return to the first step of this
-     "two step" algorithm.</li>
-
-    </ol>
-
-    <p>When the above "two step" algorithm says to <dfn
-    title="concept-get-attributes-when-sniffing">get an
-    attribute</dfn>, it means doing this:</p>
-
-    <ol>
-
-     <li><p>If the byte at <var title="">position</var> is one of 0x09
-     (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR),
-     0x20 (ASCII space), or 0x2F (ASCII /) then advance <var
-     title="">position</var> to the next byte and redo this
-     substep.</p></li>
-
-     <li><p>If the byte at <var title="">position</var> is 0x3E (ASCII
-     >), then abort the "get an attribute" algorithm. There isn't
-     one.</p></li>
-
-     <li><p>Otherwise, the byte at <var title="">position</var> is the
-     start of the attribute name. Let <var title="">attribute
-     name</var> and <var title="">attribute value</var> be the empty
-     string.</p></li>
-
-     <li><p><i>Attribute name</i>: Process the byte at <var
-     title="">position</var> as follows:</p>
-
-      <dl class="switch">
-
-       <dt>If it is 0x3D (ASCII =), and the <var title="">attribute
-       name</var> is longer than the empty string</dt>
-
-       <dd>Advance <var title="">position</var> to the next byte and
-       jump to the step below labeled <i>value</i>.</dd>
-
-       <dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
-       FF), 0x0D (ASCII CR), or 0x20 (ASCII space)</dt>
-
-       <dd>Jump to the step below labeled <i>spaces</i>.</dd>
-
-       <dt>If it is 0x2F (ASCII /) or 0x3E (ASCII >)</dt>
-
-       <dd>Abort the "get an attribute" algorithm. The attribute's
-       name is the value of <var title="">attribute name</var>, its
-       value is the empty string.</dd>
-
-       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
-       Z)</dt>
-
-       <dd>Append the Unicode character with code point <span
-       title=""><var title="">b</var>+0x20</span> to <var
-       title="">attribute name</var> (where <var title="">b</var> is
-       the value of the byte at <var title="">position</var>). (This
-       converts the input to lowercase.)</dd>
-
-       <dt>Anything else</dt>
-
-       <dd>Append the Unicode character with the same code point as the
-       value of the byte at <var title="">position</var>) to <var
-       title="">attribute name</var>. (It doesn't actually matter how
-       bytes outside the ASCII range are handled here, since only
-       ASCII characters can contribute to the detection of a character
-       encoding.)</dd>
-
-      </dl>
-
-     </li>
-
-     <li><p>Advance <var title="">position</var> to the next byte and
-     return to the previous step.</p></li>
-
-     <li><p><i>Spaces</i>: If the byte at <var
-     title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
-     LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
-     advance <var title="">position</var> to the next byte, then,
-     repeat this step.</p></li>
-
-     <li><p>If the byte at <var title="">position</var> is
-     <em>not</em> 0x3D (ASCII =), abort the "get an attribute"
-     algorithm. The attribute's name is the value of <var
-     title="">attribute name</var>, its value is the empty
-     string.</p></li>
-
-     <li><p>Advance <var title="">position</var> past the 0x3D (ASCII
-     =) byte.</p></li>
-
-     <li><p><i>Value</i>: If the byte at <var
-     title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
-     LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
-     advance <var title="">position</var> to the next byte, then,
-     repeat this step.</p></li>
-
-     <li><p>Process the byte at <var title="">position</var> as
-     follows:</p>
-
-      <dl class="switch">
-
-       <dt>If it is 0x22 (ASCII ") or 0x27 (ASCII ')</dt>
-
-       <dd>
-
-        <ol>
-
-         <li>Let <var title="">b</var> be the value of the byte at
-         <var title="">position</var>.</li>
-
-         <li>Advance <var title="">position</var> to the next
-         byte.</li>
-
-         <li>If the value of the byte at <var title="">position</var>
-         is the value of <var title="">b</var>, then advance <var
-         title="">position</var> to the next byte and abort the "get
-         an attribute" algorithm. The attribute's name is the value of
-         <var title="">attribute name</var>, and its value is the
-         value of <var title="">attribute value</var>.</li>
-
-         <li>Otherwise, if the value of the byte at <var
-         title="">position</var> is in the range 0x41 (ASCII A) to
-         0x5A (ASCII Z), then append a Unicode character to <var
-         title="">attribute value</var> whose code point is 0x20 more
-         than the value of the byte at <var
-         title="">position</var>.</li>
-
-         <li>Otherwise, append a Unicode character to <var
-         title="">attribute value</var> whose code point is the same as
-         the value of the byte at <var title="">position</var>.</li>
-
-         <li>Return to the second step in these substeps.</li>
-
-        </ol>
-
-       </dd>
-
-       <dt>If it is 0x3E (ASCII >)</dt>
-
-       <dd>Abort the "get an attribute" algorithm. The attribute's
-       name is the value of <var title="">attribute name</var>, its
-       value is the empty string.</dd>
-
-
-       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
-       Z)</dt>
-
-       <dd>Append the Unicode character with code point <span title=""><var
-       title="">b</var>+0x20</span> to <var title="">attribute
-       value</var> (where <var title="">b</var> is the value of the
-       byte at <var title="">position</var>). Advance <var
-       title="">position</var> to the next byte.</dd>
-
-       <dt>Anything else</dt>
-
-       <dd>Append the Unicode character with the same code point as the
-       value of the byte at <var title="">position</var>) to <var
-       title="">attribute value</var>. Advance <var
-       title="">position</var> to the next byte.</dd>
-
-      </dl>
-
-     </li>
-
-     <li><p>Process the byte at <var title="">position</var> as
-     follows:</p>
-
-      <dl class="switch">
-
-       <dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
-       FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII
-       >)</dt>
-
-       <dd>Abort the "get an attribute" algorithm. The attribute's
-       name is the value of <var title="">attribute name</var> and its
-       value is the value of <var title="">attribute value</var>.</dd>
-
-       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
-       Z)</dt>
-
-       <dd>Append the Unicode character with code point <span title=""><var
-       title="">b</var>+0x20</span> to <var title="">attribute
-       value</var> (where <var title="">b</var> is the value of the
-       byte at <var title="">position</var>).</dd>
-
-       <dt>Anything else</dt>
-
-       <dd>Append the Unicode character with the same code point as the
-       value of the byte at <var title="">position</var>) to <var
-       title="">attribute value</var>.</dd>
-
-      </dl>
-
-     </li>
-
-     <li><p>Advance <var title="">position</var> to the next byte and
-     return to the previous step.</p></li>
-
-    </ol>
-
-    <p>For the sake of interoperability, user agents should not use a
-    pre-scan algorithm that returns different results than the one
-    described above. (But, if you do, please at least let us know, so
-    that we can improve this algorithm and benefit everyone...)</p>
-
    </li>
 
-   <li><p>If the user agent has information on the likely encoding for
-   this page, e.g. based on the encoding of the page when it was last
-   visited, then return that encoding, with the <span
+   <li><p>Otherwise, if the user agent has information on the likely
+   encoding for this page, e.g. based on the encoding of the page when
+   it was last visited, then return that encoding, with the <span
    title="concept-encoding-confidence">confidence</span>
    <i>tentative</i>, and abort these steps.</p></li>
 
@@ -94814,18 +94453,408 @@
   as the user agent uses the returned value to select the decoder to
   use for the input stream.</p>
 
+  <hr>
+
+  <p>When an algorithm requires a user agent to <dfn>prescan a byte
+  stream to determine its encoding</dfn>, given some defined <var
+  title="">end condition</var>, then it must run the following steps.
+  These steps either abort unsuccessfully or return a character
+  encoding.</p>
+
+  <ol>
+
+   <li>
+
+    <p>Let <var title="">position</var> be a pointer to a byte in the
+    input stream, initially pointing at the first byte. If at any
+    point during these steps the user agent either runs out of bytes
+    or reaches its <var title="">end condition</var>, then abort the
+    <span>prescan a byte stream to determine its encoding</span>
+    algorithm unsuccessfully.</p>
+
+   </li>
+
+   <li>
+
+    <p><i>Loop</i>: If <var title="">position</var> points to:</p>
+
+    <dl class="switch">
+
+     <dt>A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--')</dt>
+     <dd>
+
+      <p>Advance the <var title="">position</var> pointer so that it
+      points at the first 0x3E byte which is preceded by two 0x2D
+      bytes (i.e. at the end of an ASCII '-->' sequence) and comes
+      after the 0x3C byte that was found. (The two 0x2D bytes can be
+      the same as the those in the '<!--' sequence.)</p>
+
+     </dd>
+
+     <dt>A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '<meta' followed by a space or slash)</dt>
+     <dd>
+
+      <ol>
+
+       <li><p>Advance the <var title="">position</var> pointer so
+       that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or
+       0x2F byte (the one in sequence of characters matched
+       above).</p></li>
+
+       <li><p>Let <var title="">attribute list</var> be an empty
+       list of strings.</p></li> <!-- so long as we only care about
+       http-equiv, content, and charset, this can be a 3-bit
+       bitfield -->
+
+       <li><p>Let <var title="">got pragma</var> be false.</p></li>
+
+       <li><p>Let <var title="">need pragma</var> be null.</p></li>
+
+       <li><p>Let <var title="">charset</var> be the null value
+       (which, for the purposes of this algorithm, is distinct from
+       an unrecognised encoding or the empty string).</p></li>
+
+       <li><p><i>Attributes</i>: <span
+       title="concept-get-attributes-when-sniffing">Get an
+       attribute</span> and its value. If no attribute was sniffed,
+       then jump to the <i>processing</i> step below.</p></li>
+
+       <li><p>If the attribute's name is already in <var
+       title="">attribute list</var>, then return to the step
+       labeled <i>attributes</i>.</p>
+
+       <li><p>Add the attribute's name to <var title="">attribute
+       list</var>.</p>
+
+       <li>
+
+        <p>Run the appropriate step from the following list, if one
+        applies:</p>
+
+        <dl class="switch">
+
+         <dt>If the attribute's name is "<code
+         title="">http-equiv</code>"</dt>
+
+         <dd><p>If the attribute's value is "<code
+         title="">content-type</code>", then set <var title="">got
+         pragma</var> to true.</p></dd>
+
+         <dt>If the attribute's name is "<code
+         title="">content</code>"</dt>
+
+         <dd><p>Apply the <span>algorithm for extracting an encoding
+         from a <code>meta</code> element</span>, giving the
+         attribute's value as the string to parse. If an encoding is
+         returned, and if <var title="">charset</var> is still set
+         to null, let <var title="">charset</var> be the encoding
+         returned, and set <var title="">need pragma</var> to
+         true.</p></dd>
+
+         <dt>If the attribute's name is "<code
+         title="">charset</code>"</dt>
+
+         <dd><p>Let <var title="">charset</var> be the encoding
+         corresponding to the attribute's value, and set <var
+         title="">need pragma</var> to false.</p></dd>
+
+        </dl>
+
+       </li>
+
+       <li><p>Return to the step labeled <i>attributes</i>.</p></li>
+
+       <li><p><i>Processing</i>: If <var title="">need pragma</var> is
+       null, then jump to the step below labeled <i>next
+       byte</i>.</p></li>
+
+       <li><p>If <var title="">need pragma</var> is true but <var
+       title="">got pragma</var> is false, then jump to the step below
+       labeled <i>next byte</i>.</p></li>
+
+       <li><p>If <var title="">charset</var> is <span>a UTF-16
+       encoding</span>, change the value of <var
+       title="">charset</var> to UTF-8.</p></li>
+
+       <li><p>If <var title="">charset</var> is not a supported
+       character encoding, then jump to the step below labeled <i>next
+       byte</i>.</p></li>
+
+       <li><p>Abort the <span>prescan a byte stream to determine its
+       encoding</span> algorithm, returning the encoding given by <var
+       title="">charset</var>.</p></li>
+
+      </ol>
+
+     </dd>
+
+     <dt>A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>
+     <dd>
+
+      <ol>
+
+       <li><p>Advance the <var title="">position</var> pointer so
+       that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF),
+       0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
+       (ASCII >) byte.</p></li>
+
+       <li><p>Repeatedly <span
+       title="concept-get-attributes-when-sniffing">get an
+       attribute</span> until no further attributes can be found, then
+       jump to the step below labeled <i>next byte</i>.</p></li>
+
+      </ol>
+
+     </dd>
+
+     <dt>A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!')</dt>
+     <dt>A sequence of bytes starting with: 0x3C 0x2F (ASCII '</')</dt>
+     <dt>A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')</dt>
+     <dd>
+
+      <p>Advance the <var title="">position</var> pointer so that it
+      points at the first 0x3E byte (ASCII >) that comes after the
+      0x3C byte that was found.</p>
+
+     </dd>
+
+     <dt>Any other byte</dt>
+     <dd>
+
+      <p>Do nothing with that byte.</p>
+
+     </dd>
+
+    </dl>
+
+   </li>
+
+   <li><i>Next byte</i>: Move <var title="">position</var> so it
+   points at the next byte in the input stream, and return to the step
+   above labeld <i>loop</i>.</li>
+
+  </ol>
+
+  <p>When the <span>prescan a byte stream to determine its
+  encoding</span> algorithm says to <dfn
+  title="concept-get-attributes-when-sniffing">get an attribute</dfn>,
+  it means doing this:</p>
+
+  <ol>
+
+   <li><p>If the byte at <var title="">position</var> is one of 0x09
+   (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR),
+   0x20 (ASCII space), or 0x2F (ASCII /) then advance <var
+   title="">position</var> to the next byte and redo this
+   step.</p></li>
+
+   <li><p>If the byte at <var title="">position</var> is 0x3E (ASCII
+   >), then abort the <span
+   title="concept-get-attributes-when-sniffing">get an
+   attribute</span> algorithm. There isn't one.</p></li>
+
+   <li><p>Otherwise, the byte at <var title="">position</var> is the
+   start of the attribute name. Let <var title="">attribute name</var>
+   and <var title="">attribute value</var> be the empty
+   string.</p></li>
+
+   <li><p><i>Attribute name</i>: Process the byte at <var
+   title="">position</var> as follows:</p>
+
+    <dl class="switch">
+
+     <dt>If it is 0x3D (ASCII =), and the <var title="">attribute
+     name</var> is longer than the empty string</dt>
+
+     <dd>Advance <var title="">position</var> to the next byte and
+     jump to the step below labeled <i>value</i>.</dd>
+
+     <dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
+     FF), 0x0D (ASCII CR), or 0x20 (ASCII space)</dt>
+
+     <dd>Jump to the step below labeled <i>spaces</i>.</dd>
+
+     <dt>If it is 0x2F (ASCII /) or 0x3E (ASCII >)</dt>
+
+     <dd>Abort the <span
+     title="concept-get-attributes-when-sniffing">get an
+     attribute</span> algorithm. The attribute's name is the value of
+     <var title="">attribute name</var>, its value is the empty
+     string.</dd>
+
+     <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
+     Z)</dt>
+
+     <dd>Append the Unicode character with code point <span
+     title=""><var title="">b</var>+0x20</span> to <var
+     title="">attribute name</var> (where <var title="">b</var> is
+     the value of the byte at <var title="">position</var>). (This
+     converts the input to lowercase.)</dd>
+
+     <dt>Anything else</dt>
+
+     <dd>Append the Unicode character with the same code point as the
+     value of the byte at <var title="">position</var>) to <var
+     title="">attribute name</var>. (It doesn't actually matter how
+     bytes outside the ASCII range are handled here, since only
+     ASCII characters can contribute to the detection of a character
+     encoding.)</dd>
+
+    </dl>
+
+   </li>
+
+   <li><p>Advance <var title="">position</var> to the next byte and
+   return to the previous step.</p></li>
+
+   <li><p><i>Spaces</i>: If the byte at <var
+   title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
+   LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
+   advance <var title="">position</var> to the next byte, then,
+   repeat this step.</p></li>
+
+   <li><p>If the byte at <var title="">position</var> is <em>not</em>
+   0x3D (ASCII =), abort the <span
+   title="concept-get-attributes-when-sniffing">get an
+   attribute</span> algorithm. The attribute's name is the value of
+   <var title="">attribute name</var>, its value is the empty
+   string.</p></li>
+
+   <li><p>Advance <var title="">position</var> past the 0x3D (ASCII
+   =) byte.</p></li>
+
+   <li><p><i>Value</i>: If the byte at <var
+   title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
+   LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
+   advance <var title="">position</var> to the next byte, then,
+   repeat this step.</p></li>
+
+   <li><p>Process the byte at <var title="">position</var> as
+   follows:</p>
+
+    <dl class="switch">
+
+     <dt>If it is 0x22 (ASCII ") or 0x27 (ASCII ')</dt>
+
+     <dd>
+
+      <ol>
+
+       <li>Let <var title="">b</var> be the value of the byte at
+       <var title="">position</var>.</li>
+
+       <li><i>Quote loop</i>: Advance <var title="">position</var> to
+       the next byte.</li>
+
+       <li>If the value of the byte at <var title="">position</var> is
+       the value of <var title="">b</var>, then advance <var
+       title="">position</var> to the next byte and abort the "get an
+       attribute" algorithm. The attribute's name is the value of <var
+       title="">attribute name</var>, and its value is the value of
+       <var title="">attribute value</var>.</li>
+
+       <li>Otherwise, if the value of the byte at <var
+       title="">position</var> is in the range 0x41 (ASCII A) to 0x5A
+       (ASCII Z), then append a Unicode character to <var
+       title="">attribute value</var> whose code point is 0x20 more
+       than the value of the byte at <var
+       title="">position</var>.</li>
+
+       <li>Otherwise, append a Unicode character to <var
+       title="">attribute value</var> whose code point is the same as
+       the value of the byte at <var title="">position</var>.</li>
+
+       <li>Return to the step above labeled <i>quote loop</i>.</li>
+
+      </ol>
+
+     </dd>
+
+     <dt>If it is 0x3E (ASCII >)</dt>
+
+     <dd>Abort the <span
+     title="concept-get-attributes-when-sniffing">get an
+     attribute</span> algorithm. The attribute's name is the value of
+     <var title="">attribute name</var>, its value is the empty
+     string.</dd>
+
+
+     <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
+     Z)</dt>
+
+     <dd>Append the Unicode character with code point <span
+     title=""><var title="">b</var>+0x20</span> to <var
+     title="">attribute value</var> (where <var title="">b</var> is
+     the value of the byte at <var title="">position</var>). Advance
+     <var title="">position</var> to the next byte.</dd>
+
+     <dt>Anything else</dt>
+
+     <dd>Append the Unicode character with the same code point as the
+     value of the byte at <var title="">position</var>) to <var
+     title="">attribute value</var>. Advance <var
+     title="">position</var> to the next byte.</dd>
+
+    </dl>
+
+   </li>
+
+   <li><p>Process the byte at <var title="">position</var> as
+   follows:</p>
+
+    <dl class="switch">
+
+     <dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
+     FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII
+     >)</dt>
+
+     <dd>Abort the <span
+     title="concept-get-attributes-when-sniffing">get an
+     attribute</span> algorithm. The attribute's name is the value of
+     <var title="">attribute name</var> and its value is the value of
+     <var title="">attribute value</var>.</dd>
+
+     <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII Z)</dt>
+
+     <dd>Append the Unicode character with code point <span
+     title=""><var title="">b</var>+0x20</span> to <var
+     title="">attribute value</var> (where <var title="">b</var> is
+     the value of the byte at <var title="">position</var>).</dd>
+
+     <dt>Anything else</dt>
+
+     <dd>Append the Unicode character with the same code point as the
+     value of the byte at <var title="">position</var>) to <var
+     title="">attribute value</var>.</dd>
+
+    </dl>
+
+   </li>
+
+   <li><p>Advance <var title="">position</var> to the next byte and
+   return to the previous step.</p></li>
+
+  </ol>
+
+  <p>For the sake of interoperability, user agents should not use a
+  pre-scan algorithm that returns different results than the one
+  described above. (But, if you do, please at least let us know, so
+  that we can improve this algorithm and benefit everyone...)</p>
+
 <!--(removed this since the specs are being changed)
-  <p class="note">This algorithm is a <span>willful violation</span>
-  of the HTTP specification, which requires that the encoding be
-  assumed to be ISO-8859-1 in the absence of a <span>character
-  encoding declaration</span> to the contrary, and of RFC 2046,
-  which requires that the encoding be assumed to be US-ASCII in the
-  absence of a <span>character encoding declaration</span> to the
-  contrary. This specification's third approach is motivated by a
+  <p class="note">These algorithms are a <span>willful
+  violation</span> of the HTTP specification, which requires that the
+  encoding be assumed to be ISO-8859-1 in the absence of a
+  <span>character encoding declaration</span> to the contrary, and of
+  RFC 2046, which requires that the encoding be assumed to be US-ASCII
+  in the absence of a <span>character encoding declaration</span> to
+  the contrary. This specification's third approach is motivated by a
   desire to be maximally compatible with legacy content. <a
   href="#refsHTTP">[HTTP]</a> <a href="#refsRFC2046">[RFC2046]</a></p>
 -->
 
+
+
   <h5>Character encodings</h5>
 
   <p>User agents must at a minimum support the UTF-8 and Windows-1252