[html5] r943 - /
whatwg at whatwg.org
whatwg at whatwg.org
Thu Jun 21 18:46:43 PDT 2007
Author: ianh
Date: 2007-06-21 18:44:33 -0700 (Thu, 21 Jun 2007)
New Revision: 943
Modified:
index
source
Log:
[eciowt] (2) Be explicit about what an invalid Unicode character is.
Modified: index
===================================================================
--- index 2007-06-21 23:57:48 UTC (rev 942)
+++ index 2007-06-22 01:44:33 UTC (rev 943)
@@ -22,7 +22,7 @@
<h1 id=html-5>HTML 5</h1>
- <h2 class="no-num no-toc" id=working>Working Draft — 21 June 2007</h2>
+ <h2 class="no-num no-toc" id=working>Working Draft — 22 June 2007</h2>
<p>You can take part in this work. <a
href="http://www.whatwg.org/mailing-list">Join the working group's
@@ -35026,12 +35026,14 @@
<td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('Ÿ')
</table>
- <p>Otherwise, if the number is not a valid Unicode character (e.g. if the
- number is higher than 1114111), or if the number is zero, then return a
- character token for the U+FFFD REPLACEMENT CHARACTER character instead.</p>
+ <p>Otherwise, if the number is zero, if the number is higher than
+ 0x10FFFF, or if it's one of the surrogate characters (characters in the
+ range 0xD800 to 0xDFFF), then this is a <a href="#parse">parse
+ error</a>; return a character token for the U+FFFD REPLACEMENT CHARACTER
+ character instead.</p>
<p>Otherwise, return a character token for the Unicode character whose
- code point is that number.
+ code point is that number.</p>
<dt>Anything else
Modified: source
===================================================================
--- source 2007-06-21 23:57:48 UTC (rev 942)
+++ source 2007-06-22 01:44:33 UTC (rev 943)
@@ -32337,13 +32337,14 @@
<tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('Ÿ')
</table>
- <p>Otherwise, if the number is not a valid Unicode character
- (e.g. if the number is higher than 1114111), or if the number is
- zero, then return a character token for the U+FFFD REPLACEMENT
+ <p>Otherwise, if the number is zero, if the number is higher than
+ 0x10FFFF, or if it's one of the surrogate characters (characters
+ in the range 0xD800 to 0xDFFF), then this is a <span>parse
+ error</span>; return a character token for the U+FFFD REPLACEMENT
CHARACTER character instead.</p>
<p>Otherwise, return a character token for the Unicode character
- whose code point is that number.
+ whose code point is that number.</p>
</dd>
More information about the Commit-Watchers
mailing list