[html5] r5862 - [t] (0) Remove the requirement that the parser deal with raw surrogates, since t [...]
whatwg at whatwg.org
whatwg at whatwg.org
Tue Feb 8 16:29:13 PST 2011
Author: ianh
Date: 2011-02-08 16:29:12 -0800 (Tue, 08 Feb 2011)
New Revision: 5862
Modified:
complete.html
index
source
Log:
[t] (0) Remove the requirement that the parser deal with raw surrogates, since they can't make it this far.
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=11298
Modified: complete.html
===================================================================
--- complete.html 2011-02-09 00:06:14 UTC (rev 5861)
+++ complete.html 2011-02-09 00:29:12 UTC (rev 5862)
@@ -77607,13 +77607,6 @@
motivated by a desire to increase the resilience of user agents in
the face of naïve transcoders.</p>
- <p>Code points in the range U+D800 to U+DFFF<!-- surrogates are not
- allowed e.g. in UTF-8, and we don't want them to suddenly turn into
- code points when they go through a UTF-16 pipe --> in the input must
- be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of
- such characters and code points are <a href=#parse-error title="parse error">parse
- errors</a>.</p>
-
<p>Any occurrences of any characters in the ranges U+0001 to U+0008,
<!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
CR allowed --> U+000E to U+001F, <!-- ASCII allowed --> U+007F
@@ -80255,10 +80248,9 @@
<tr><td>0x9E <td>U+017E <td>LATIN SMALL LETTER Z WITH CARON (ž)
<tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS (Ÿ)
</table><p>Otherwise, if the number is in the range 0xD800 to 0xDFFF<!--
- surrogates not allowed; see the comment in the "preprocessing the
- input stream" section for details --> or is greater than 0x10FFFF,
- then this is a <a href=#parse-error>parse error</a>. Return a U+FFFD
- REPLACEMENT CHARACTER.</p>
+ surrogates --> or is greater than 0x10FFFF, then this is a
+ <a href=#parse-error>parse error</a>. Return a U+FFFD REPLACEMENT
+ CHARACTER.</p>
<p>Otherwise, return a character token for the Unicode character
whose code point is that number.
Modified: index
===================================================================
--- index 2011-02-09 00:06:14 UTC (rev 5861)
+++ index 2011-02-09 00:29:12 UTC (rev 5862)
@@ -73578,13 +73578,6 @@
motivated by a desire to increase the resilience of user agents in
the face of naïve transcoders.</p>
- <p>Code points in the range U+D800 to U+DFFF<!-- surrogates are not
- allowed e.g. in UTF-8, and we don't want them to suddenly turn into
- code points when they go through a UTF-16 pipe --> in the input must
- be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of
- such characters and code points are <a href=#parse-error title="parse error">parse
- errors</a>.</p>
-
<p>Any occurrences of any characters in the ranges U+0001 to U+0008,
<!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
CR allowed --> U+000E to U+001F, <!-- ASCII allowed --> U+007F
@@ -76226,10 +76219,9 @@
<tr><td>0x9E <td>U+017E <td>LATIN SMALL LETTER Z WITH CARON (ž)
<tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS (Ÿ)
</table><p>Otherwise, if the number is in the range 0xD800 to 0xDFFF<!--
- surrogates not allowed; see the comment in the "preprocessing the
- input stream" section for details --> or is greater than 0x10FFFF,
- then this is a <a href=#parse-error>parse error</a>. Return a U+FFFD
- REPLACEMENT CHARACTER.</p>
+ surrogates --> or is greater than 0x10FFFF, then this is a
+ <a href=#parse-error>parse error</a>. Return a U+FFFD REPLACEMENT
+ CHARACTER.</p>
<p>Otherwise, return a character token for the Unicode character
whose code point is that number.
Modified: source
===================================================================
--- source 2011-02-09 00:06:14 UTC (rev 5861)
+++ source 2011-02-09 00:29:12 UTC (rev 5862)
@@ -87882,13 +87882,6 @@
motivated by a desire to increase the resilience of user agents in
the face of naïve transcoders.</p>
- <p>Code points in the range U+D800 to U+DFFF<!-- surrogates are not
- allowed e.g. in UTF-8, and we don't want them to suddenly turn into
- code points when they go through a UTF-16 pipe --> in the input must
- be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of
- such characters and code points are <span title="parse error">parse
- errors</span>.</p>
-
<p>Any occurrences of any characters in the ranges U+0001 to U+0008,
<!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
CR allowed --> U+000E to U+001F, <!-- ASCII allowed --> U+007F
@@ -90948,10 +90941,9 @@
</table>
<p>Otherwise, if the number is in the range 0xD800 to 0xDFFF<!--
- surrogates not allowed; see the comment in the "preprocessing the
- input stream" section for details --> or is greater than 0x10FFFF,
- then this is a <span>parse error</span>. Return a U+FFFD
- REPLACEMENT CHARACTER.</p>
+ surrogates --> or is greater than 0x10FFFF, then this is a
+ <span>parse error</span>. Return a U+FFFD REPLACEMENT
+ CHARACTER.</p>
<p>Otherwise, return a character token for the Unicode character
whose code point is that number.
More information about the Commit-Watchers
mailing list