[html5] r6992 - [e] (0) Move a section so that the character encoding requirements are closer to [...]
whatwg at whatwg.org
whatwg at whatwg.org
Mon Feb 13 14:50:12 PST 2012
Author: ianh
Date: 2012-02-13 14:50:10 -0800 (Mon, 13 Feb 2012)
New Revision: 6992
Modified:
complete.html
index
source
Log:
[e] (0) Move a section so that the character encoding requirements are closer together.
Affected topics: HTML Syntax and Parsing
Modified: complete.html
===================================================================
--- complete.html 2012-02-13 22:48:10 UTC (rev 6991)
+++ complete.html 2012-02-13 22:50:10 UTC (rev 6992)
@@ -1119,8 +1119,8 @@
<ol>
<li><a href=#determining-the-character-encoding><span class=secno>12.2.2.1 </span>Determining the character encoding</a></li>
<li><a href=#character-encodings-0><span class=secno>12.2.2.2 </span>Character encodings</a></li>
- <li><a href=#preprocessing-the-input-stream><span class=secno>12.2.2.3 </span>Preprocessing the input stream</a></li>
- <li><a href=#changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</a></ol></li>
+ <li><a href=#changing-the-encoding-while-parsing><span class=secno>12.2.2.3 </span>Changing the encoding while parsing</a></li>
+ <li><a href=#preprocessing-the-input-stream><span class=secno>12.2.2.4 </span>Preprocessing the input stream</a></ol></li>
<li><a href=#parse-state><span class=secno>12.2.3 </span>Parse state</a>
<ol>
<li><a href=#the-insertion-mode><span class=secno>12.2.3.1 </span>The insertion mode</a></li>
@@ -81878,8 +81878,60 @@
- <h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.3 </span>Preprocessing the input stream</h5>
+ <h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.3 </span>Changing the encoding while parsing</h5>
+ <p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
+ encoding</dfn>, it must run the following steps. This might happen
+ if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above
+ failed to find an encoding, or if it found an encoding that was not
+ the actual encoding of the file.</p>
+
+ <ol><li>If the encoding that is already being used to interpret the
+ input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i> and abort these steps. The new encoding is ignored;
+ if it was anything but the same encoding, then it would be clearly
+ incorrect.</li>
+
+ <li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
+ it to UTF-8.</li>
+
+ <li>If the new encoding is identical or equivalent to the encoding
+ that is already being used to interpret the input stream, then set
+ the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i> and abort these steps. This happens when the
+ encoding information found in the file matches what the
+ <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> determined to be the
+ encoding, and in the second pass through the parser if the first
+ pass found that the encoding sniffing algorithm described in the
+ earlier section failed to find the right encoding.</li>
+
+ <li>If all the bytes up to the last byte converted by the current
+ decoder have the same Unicode interpretations in both the current
+ encoding and the new encoding, and if the user agent supports
+ changing the converter on the fly, then the user agent may change
+ to the new converter for the encoding on the fly. Set the
+ <a href="#document's-character-encoding">document's character encoding</a> and the encoding used to
+ convert the input stream to the new encoding, set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i>, and abort these steps.</li>
+
+ <li>Otherwise, <a href=#navigate>navigate</a><!--DONAV reparse--> to the
+ document again, with <a href=#replacement-enabled>replacement enabled</a>, and using
+ the same <a href=#source-browsing-context>source browsing context</a>, but this time skip
+ the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> and instead just set
+ the encoding to the new encoding and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i>. Whenever possible, this should be done without
+ actually contacting the network layer (the bytes should be
+ re-parsed from memory), even if, e.g., the document is marked as
+ not being cacheable. If this is not possible and contacting the
+ network layer would involve repeating a request that uses a method
+ other than HTTP GET (<a href=#concept-http-equivalent-get title=concept-http-equivalent-get>or
+ equivalent</a> for non-HTTP URLs), then instead set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i> and ignore the new encoding. The resource will be
+ misinterpreted. User agents may notify the user of the situation,
+ to aid in application development.</li>
+
+ </ol><h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.4 </span>Preprocessing the input stream</h5>
+
<p>The <dfn id=input-stream>input stream</dfn> consists of the characters pushed
into it as the <a href=#the-input-byte-stream>input byte stream</a> is decoded or from the
various APIs that directly manipulate the input stream.</p>
@@ -81936,62 +81988,9 @@
consumed. Otherwise, the "EOF" character is not a real character in
the stream, but rather the lack of any further characters.</p>
+ </div>
- <h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</h5>
- <p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
- encoding</dfn>, it must run the following steps. This might happen
- if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above
- failed to find an encoding, or if it found an encoding that was not
- the actual encoding of the file.</p>
-
- <ol><li>If the encoding that is already being used to interpret the
- input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i> and abort these steps. The new encoding is ignored;
- if it was anything but the same encoding, then it would be clearly
- incorrect.</li>
-
- <li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
- it to UTF-8.</li>
-
- <li>If the new encoding is identical or equivalent to the encoding
- that is already being used to interpret the input stream, then set
- the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i> and abort these steps. This happens when the
- encoding information found in the file matches what the
- <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> determined to be the
- encoding, and in the second pass through the parser if the first
- pass found that the encoding sniffing algorithm described in the
- earlier section failed to find the right encoding.</li>
-
- <li>If all the bytes up to the last byte converted by the current
- decoder have the same Unicode interpretations in both the current
- encoding and the new encoding, and if the user agent supports
- changing the converter on the fly, then the user agent may change
- to the new converter for the encoding on the fly. Set the
- <a href="#document's-character-encoding">document's character encoding</a> and the encoding used to
- convert the input stream to the new encoding, set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i>, and abort these steps.</li>
-
- <li>Otherwise, <a href=#navigate>navigate</a><!--DONAV reparse--> to the
- document again, with <a href=#replacement-enabled>replacement enabled</a>, and using
- the same <a href=#source-browsing-context>source browsing context</a>, but this time skip
- the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> and instead just set
- the encoding to the new encoding and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i>. Whenever possible, this should be done without
- actually contacting the network layer (the bytes should be
- re-parsed from memory), even if, e.g., the document is marked as
- not being cacheable. If this is not possible and contacting the
- network layer would involve repeating a request that uses a method
- other than HTTP GET (<a href=#concept-http-equivalent-get title=concept-http-equivalent-get>or
- equivalent</a> for non-HTTP URLs), then instead set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i> and ignore the new encoding. The resource will be
- misinterpreted. User agents may notify the user of the situation,
- to aid in application development.</li>
-
- </ol></div>
-
-
<div class=impl>
<h4 id=parse-state><span class=secno>12.2.3 </span>Parse state</h4>
Modified: index
===================================================================
--- index 2012-02-13 22:48:10 UTC (rev 6991)
+++ index 2012-02-13 22:50:10 UTC (rev 6992)
@@ -1119,8 +1119,8 @@
<ol>
<li><a href=#determining-the-character-encoding><span class=secno>12.2.2.1 </span>Determining the character encoding</a></li>
<li><a href=#character-encodings-0><span class=secno>12.2.2.2 </span>Character encodings</a></li>
- <li><a href=#preprocessing-the-input-stream><span class=secno>12.2.2.3 </span>Preprocessing the input stream</a></li>
- <li><a href=#changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</a></ol></li>
+ <li><a href=#changing-the-encoding-while-parsing><span class=secno>12.2.2.3 </span>Changing the encoding while parsing</a></li>
+ <li><a href=#preprocessing-the-input-stream><span class=secno>12.2.2.4 </span>Preprocessing the input stream</a></ol></li>
<li><a href=#parse-state><span class=secno>12.2.3 </span>Parse state</a>
<ol>
<li><a href=#the-insertion-mode><span class=secno>12.2.3.1 </span>The insertion mode</a></li>
@@ -81878,8 +81878,60 @@
- <h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.3 </span>Preprocessing the input stream</h5>
+ <h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.3 </span>Changing the encoding while parsing</h5>
+ <p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
+ encoding</dfn>, it must run the following steps. This might happen
+ if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above
+ failed to find an encoding, or if it found an encoding that was not
+ the actual encoding of the file.</p>
+
+ <ol><li>If the encoding that is already being used to interpret the
+ input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i> and abort these steps. The new encoding is ignored;
+ if it was anything but the same encoding, then it would be clearly
+ incorrect.</li>
+
+ <li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
+ it to UTF-8.</li>
+
+ <li>If the new encoding is identical or equivalent to the encoding
+ that is already being used to interpret the input stream, then set
+ the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i> and abort these steps. This happens when the
+ encoding information found in the file matches what the
+ <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> determined to be the
+ encoding, and in the second pass through the parser if the first
+ pass found that the encoding sniffing algorithm described in the
+ earlier section failed to find the right encoding.</li>
+
+ <li>If all the bytes up to the last byte converted by the current
+ decoder have the same Unicode interpretations in both the current
+ encoding and the new encoding, and if the user agent supports
+ changing the converter on the fly, then the user agent may change
+ to the new converter for the encoding on the fly. Set the
+ <a href="#document's-character-encoding">document's character encoding</a> and the encoding used to
+ convert the input stream to the new encoding, set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i>, and abort these steps.</li>
+
+ <li>Otherwise, <a href=#navigate>navigate</a><!--DONAV reparse--> to the
+ document again, with <a href=#replacement-enabled>replacement enabled</a>, and using
+ the same <a href=#source-browsing-context>source browsing context</a>, but this time skip
+ the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> and instead just set
+ the encoding to the new encoding and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i>. Whenever possible, this should be done without
+ actually contacting the network layer (the bytes should be
+ re-parsed from memory), even if, e.g., the document is marked as
+ not being cacheable. If this is not possible and contacting the
+ network layer would involve repeating a request that uses a method
+ other than HTTP GET (<a href=#concept-http-equivalent-get title=concept-http-equivalent-get>or
+ equivalent</a> for non-HTTP URLs), then instead set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
+ <i>certain</i> and ignore the new encoding. The resource will be
+ misinterpreted. User agents may notify the user of the situation,
+ to aid in application development.</li>
+
+ </ol><h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.4 </span>Preprocessing the input stream</h5>
+
<p>The <dfn id=input-stream>input stream</dfn> consists of the characters pushed
into it as the <a href=#the-input-byte-stream>input byte stream</a> is decoded or from the
various APIs that directly manipulate the input stream.</p>
@@ -81936,62 +81988,9 @@
consumed. Otherwise, the "EOF" character is not a real character in
the stream, but rather the lack of any further characters.</p>
+ </div>
- <h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</h5>
- <p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
- encoding</dfn>, it must run the following steps. This might happen
- if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above
- failed to find an encoding, or if it found an encoding that was not
- the actual encoding of the file.</p>
-
- <ol><li>If the encoding that is already being used to interpret the
- input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i> and abort these steps. The new encoding is ignored;
- if it was anything but the same encoding, then it would be clearly
- incorrect.</li>
-
- <li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
- it to UTF-8.</li>
-
- <li>If the new encoding is identical or equivalent to the encoding
- that is already being used to interpret the input stream, then set
- the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i> and abort these steps. This happens when the
- encoding information found in the file matches what the
- <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> determined to be the
- encoding, and in the second pass through the parser if the first
- pass found that the encoding sniffing algorithm described in the
- earlier section failed to find the right encoding.</li>
-
- <li>If all the bytes up to the last byte converted by the current
- decoder have the same Unicode interpretations in both the current
- encoding and the new encoding, and if the user agent supports
- changing the converter on the fly, then the user agent may change
- to the new converter for the encoding on the fly. Set the
- <a href="#document's-character-encoding">document's character encoding</a> and the encoding used to
- convert the input stream to the new encoding, set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i>, and abort these steps.</li>
-
- <li>Otherwise, <a href=#navigate>navigate</a><!--DONAV reparse--> to the
- document again, with <a href=#replacement-enabled>replacement enabled</a>, and using
- the same <a href=#source-browsing-context>source browsing context</a>, but this time skip
- the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> and instead just set
- the encoding to the new encoding and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i>. Whenever possible, this should be done without
- actually contacting the network layer (the bytes should be
- re-parsed from memory), even if, e.g., the document is marked as
- not being cacheable. If this is not possible and contacting the
- network layer would involve repeating a request that uses a method
- other than HTTP GET (<a href=#concept-http-equivalent-get title=concept-http-equivalent-get>or
- equivalent</a> for non-HTTP URLs), then instead set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
- <i>certain</i> and ignore the new encoding. The resource will be
- misinterpreted. User agents may notify the user of the situation,
- to aid in application development.</li>
-
- </ol></div>
-
-
<div class=impl>
<h4 id=parse-state><span class=secno>12.2.3 </span>Parse state</h4>
Modified: source
===================================================================
--- source 2012-02-13 22:48:10 UTC (rev 6991)
+++ source 2012-02-13 22:50:10 UTC (rev 6992)
@@ -94997,6 +94997,67 @@
+ <h5>Changing the encoding while parsing</h5>
+
+ <p>When the parser requires the user agent to <dfn>change the
+ encoding</dfn>, it must run the following steps. This might happen
+ if the <span>encoding sniffing algorithm</span> described above
+ failed to find an encoding, or if it found an encoding that was not
+ the actual encoding of the file.</p>
+
+ <ol>
+
+ <li>If the encoding that is already being used to interpret the
+ input stream is <span>a UTF-16 encoding</span>, then set the <span
+ title="concept-encoding-confidence">confidence</span> to
+ <i>certain</i> and abort these steps. The new encoding is ignored;
+ if it was anything but the same encoding, then it would be clearly
+ incorrect.</li>
+
+ <li>If the new encoding is <span>a UTF-16 encoding</span>, change
+ it to UTF-8.</li>
+
+ <li>If the new encoding is identical or equivalent to the encoding
+ that is already being used to interpret the input stream, then set
+ the <span title="concept-encoding-confidence">confidence</span> to
+ <i>certain</i> and abort these steps. This happens when the
+ encoding information found in the file matches what the
+ <span>encoding sniffing algorithm</span> determined to be the
+ encoding, and in the second pass through the parser if the first
+ pass found that the encoding sniffing algorithm described in the
+ earlier section failed to find the right encoding.</li>
+
+ <li>If all the bytes up to the last byte converted by the current
+ decoder have the same Unicode interpretations in both the current
+ encoding and the new encoding, and if the user agent supports
+ changing the converter on the fly, then the user agent may change
+ to the new converter for the encoding on the fly. Set the
+ <span>document's character encoding</span> and the encoding used to
+ convert the input stream to the new encoding, set the <span
+ title="concept-encoding-confidence">confidence</span> to
+ <i>certain</i>, and abort these steps.</li>
+
+ <li>Otherwise, <span>navigate</span><!--DONAV reparse--> to the
+ document again, with <span>replacement enabled</span>, and using
+ the same <span>source browsing context</span>, but this time skip
+ the <span>encoding sniffing algorithm</span> and instead just set
+ the encoding to the new encoding and the <span
+ title="concept-encoding-confidence">confidence</span> to
+ <i>certain</i>. Whenever possible, this should be done without
+ actually contacting the network layer (the bytes should be
+ re-parsed from memory), even if, e.g., the document is marked as
+ not being cacheable. If this is not possible and contacting the
+ network layer would involve repeating a request that uses a method
+ other than HTTP GET (<span title="concept-http-equivalent-get">or
+ equivalent</span> for non-HTTP URLs), then instead set the <span
+ title="concept-encoding-confidence">confidence</span> to
+ <i>certain</i> and ignore the new encoding. The resource will be
+ misinterpreted. User agents may notify the user of the situation,
+ to aid in application development.</li>
+
+ </ol>
+
+
<h5>Preprocessing the input stream</h5>
<p>The <dfn>input stream</dfn> consists of the characters pushed
@@ -95057,67 +95118,6 @@
consumed. Otherwise, the "EOF" character is not a real character in
the stream, but rather the lack of any further characters.</p>
-
- <h5>Changing the encoding while parsing</h5>
-
- <p>When the parser requires the user agent to <dfn>change the
- encoding</dfn>, it must run the following steps. This might happen
- if the <span>encoding sniffing algorithm</span> described above
- failed to find an encoding, or if it found an encoding that was not
- the actual encoding of the file.</p>
-
- <ol>
-
- <li>If the encoding that is already being used to interpret the
- input stream is <span>a UTF-16 encoding</span>, then set the <span
- title="concept-encoding-confidence">confidence</span> to
- <i>certain</i> and abort these steps. The new encoding is ignored;
- if it was anything but the same encoding, then it would be clearly
- incorrect.</li>
-
- <li>If the new encoding is <span>a UTF-16 encoding</span>, change
- it to UTF-8.</li>
-
- <li>If the new encoding is identical or equivalent to the encoding
- that is already being used to interpret the input stream, then set
- the <span title="concept-encoding-confidence">confidence</span> to
- <i>certain</i> and abort these steps. This happens when the
- encoding information found in the file matches what the
- <span>encoding sniffing algorithm</span> determined to be the
- encoding, and in the second pass through the parser if the first
- pass found that the encoding sniffing algorithm described in the
- earlier section failed to find the right encoding.</li>
-
- <li>If all the bytes up to the last byte converted by the current
- decoder have the same Unicode interpretations in both the current
- encoding and the new encoding, and if the user agent supports
- changing the converter on the fly, then the user agent may change
- to the new converter for the encoding on the fly. Set the
- <span>document's character encoding</span> and the encoding used to
- convert the input stream to the new encoding, set the <span
- title="concept-encoding-confidence">confidence</span> to
- <i>certain</i>, and abort these steps.</li>
-
- <li>Otherwise, <span>navigate</span><!--DONAV reparse--> to the
- document again, with <span>replacement enabled</span>, and using
- the same <span>source browsing context</span>, but this time skip
- the <span>encoding sniffing algorithm</span> and instead just set
- the encoding to the new encoding and the <span
- title="concept-encoding-confidence">confidence</span> to
- <i>certain</i>. Whenever possible, this should be done without
- actually contacting the network layer (the bytes should be
- re-parsed from memory), even if, e.g., the document is marked as
- not being cacheable. If this is not possible and contacting the
- network layer would involve repeating a request that uses a method
- other than HTTP GET (<span title="concept-http-equivalent-get">or
- equivalent</span> for non-HTTP URLs), then instead set the <span
- title="concept-encoding-confidence">confidence</span> to
- <i>certain</i> and ignore the new encoding. The resource will be
- misinterpreted. User agents may notify the user of the situation,
- to aid in application development.</li>
-
- </ol>
-
</div>
More information about the Commit-Watchers
mailing list