[html5] r943 - /

Thu Jun 21 18:46:43 PDT 2007

Author: ianh
Date: 2007-06-21 18:44:33 -0700 (Thu, 21 Jun 2007)
New Revision: 943

Modified:
   index
   source
Log:
[eciowt] (2) Be explicit about what an invalid Unicode character is.

Modified: index
===================================================================

--- index	2007-06-21 23:57:48 UTC (rev 942)
+++ index	2007-06-22 01:44:33 UTC (rev 943)
@@ -22,7 +22,7 @@
 
    <h1 id=html-5>HTML 5</h1>
 
-   <h2 class="no-num no-toc" id=working>Working Draft — 21 June 2007</h2>
+   <h2 class="no-num no-toc" id=working>Working Draft — 22 June 2007</h2>
 
    <p>You can take part in this work. <a
     href="http://www.whatwg.org/mailing-list">Join the working group's
@@ -35026,12 +35026,14 @@
        <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('&#x0178')
     </table>
 
-    <p>Otherwise, if the number is not a valid Unicode character (e.g. if the
-     number is higher than 1114111), or if the number is zero, then return a
-     character token for the U+FFFD REPLACEMENT CHARACTER character instead.</p>
+    <p>Otherwise, if the number is zero, if the number is higher than
+     0x10FFFF, or if it's one of the surrogate characters (characters in the
+     range 0xD800 to 0xDFFF), then this is a <a href="#parse">parse
+     error</a>; return a character token for the U+FFFD REPLACEMENT CHARACTER
+     character instead.</p>
 
     <p>Otherwise, return a character token for the Unicode character whose
-     code point is that number.
+     code point is that number.</p>
 
    <dt>Anything else
 

Modified: source
===================================================================
--- source	2007-06-21 23:57:48 UTC (rev 942)
+++ source	2007-06-22 01:44:33 UTC (rev 943)
@@ -32337,13 +32337,14 @@
       <tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('&#x0178')
     </table>
 
-    <p>Otherwise, if the number is not a valid Unicode character
-    (e.g. if the number is higher than 1114111), or if the number is
-    zero, then return a character token for the U+FFFD REPLACEMENT
+    <p>Otherwise, if the number is zero, if the number is higher than
+    0x10FFFF, or if it's one of the surrogate characters (characters
+    in the range 0xD800 to 0xDFFF), then this is a <span>parse
+    error</span>; return a character token for the U+FFFD REPLACEMENT
     CHARACTER character instead.</p>
 
     <p>Otherwise, return a character token for the Unicode character
-    whose code point is that number.
+    whose code point is that number.</p>
 
    </dd>