[html5] r3374 - [acgiowt] (2) Make invalid &#x...; character references not get converted to U+F [...]

whatwg at whatwg.org whatwg at whatwg.org
Tue Jul 7 18:04:06 PDT 2009


Author: ianh
Date: 2009-07-07 18:04:05 -0700 (Tue, 07 Jul 2009)
New Revision: 3374

Modified:
   index
   source
Log:
[acgiowt] (2) Make invalid &#x...; character references not get converted to U+FFFD, for consistency with literal invalid characters.

Modified: index
===================================================================
--- index	2009-07-08 00:56:08 UTC (rev 3373)
+++ index	2009-07-08 01:04:05 UTC (rev 3374)
@@ -61677,9 +61677,10 @@
     row.</p>
 
     <table><thead><tr><th>Number <th colspan=2>Unicode character
-     <tbody><tr><td>0x0D <td>U+000A <td>LINE FEED (LF)
+     <tbody><tr><td>0x00 <td>U+FFFD <td>REPLACEMENT CHARACTER
+      <tr><td>0x0D <td>U+000A <td>LINE FEED (LF)
       <tr><td>0x80 <td>U+20AC <td>EURO SIGN ('€')
-      <tr><td>0x81 <td>U+FFFD <td>REPLACEMENT CHARACTER
+      <tr><td>0x81 <td>U+0081 <td><control>
       <tr><td>0x82 <td>U+201A <td>SINGLE LOW-9 QUOTATION MARK ('‚')
       <tr><td>0x83 <td>U+0192 <td>LATIN SMALL LETTER F WITH HOOK ('ƒ')
       <tr><td>0x84 <td>U+201E <td>DOUBLE LOW-9 QUOTATION MARK ('„')
@@ -61691,10 +61692,10 @@
       <tr><td>0x8A <td>U+0160 <td>LATIN CAPITAL LETTER S WITH CARON ('Š')
       <tr><td>0x8B <td>U+2039 <td>SINGLE LEFT-POINTING ANGLE QUOTATION MARK ('‹')
       <tr><td>0x8C <td>U+0152 <td>LATIN CAPITAL LIGATURE OE ('Œ')
-      <tr><td>0x8D <td>U+FFFD <td>REPLACEMENT CHARACTER
+      <tr><td>0x8D <td>U+008D <td><control>
       <tr><td>0x8E <td>U+017D <td>LATIN CAPITAL LETTER Z WITH CARON ('Ž')
-      <tr><td>0x8F <td>U+FFFD <td>REPLACEMENT CHARACTER
-      <tr><td>0x90 <td>U+FFFD <td>REPLACEMENT CHARACTER
+      <tr><td>0x8F <td>U+008F <td><control>
+      <tr><td>0x90 <td>U+0090 <td><control>
       <tr><td>0x91 <td>U+2018 <td>LEFT SINGLE QUOTATION MARK ('‘')
       <tr><td>0x92 <td>U+2019 <td>RIGHT SINGLE QUOTATION MARK ('’')
       <tr><td>0x93 <td>U+201C <td>LEFT DOUBLE QUOTATION MARK ('“')
@@ -61707,12 +61708,16 @@
       <tr><td>0x9A <td>U+0161 <td>LATIN SMALL LETTER S WITH CARON ('š')
       <tr><td>0x9B <td>U+203A <td>SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ('›')
       <tr><td>0x9C <td>U+0153 <td>LATIN SMALL LIGATURE OE ('œ')
-      <tr><td>0x9D <td>U+FFFD <td>REPLACEMENT CHARACTER
+      <tr><td>0x9D <td>U+009D <td><control>
       <tr><td>0x9E <td>U+017E <td>LATIN SMALL LETTER Z WITH CARON ('ž')
       <tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('Ÿ')
-    </table><!-- this is the same as the equivalent list in the input stream
-    section, except it has 0x0000 included in the first range. --><p>Otherwise, if the number is in the range 0x0000 to 0x0008, <!--
-    HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF, CR
+    </table><p>Otherwise, return a character token for the Unicode character
+    whose code point is that number.
+
+    <!-- this is the same as the equivalent list in the input stream
+    section -->
+    If the number is in the range 0x0001 to 0x0008, <!-- HT, LF
+    allowed --> <!-- U+000B is in the next list --> <!-- FF, CR
     allowed --> 0x000E to 0x001F, <!-- ASCII allowed --> 0x007F <!--to
     0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xD800 to
     0xDFFF<!-- surrogates not allowed -->, 0xFDD0 to 0xFDEF, or is one
@@ -61722,12 +61727,8 @@
     0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE,
     0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or
     0x10FFFF, or is higher than 0x10FFFF, then this is a <a href=#parse-error>parse
-    error</a>; return a character token for the U+FFFD REPLACEMENT
-    CHARACTER character instead.</p>
+    error</a>.</p>
 
-    <p>Otherwise, return a character token for the Unicode character
-    whose code point is that number.</p>
-
    </dd>
 
 

Modified: source
===================================================================
--- source	2009-07-08 00:56:08 UTC (rev 3373)
+++ source	2009-07-08 01:04:05 UTC (rev 3374)
@@ -75699,9 +75699,10 @@
      <thead>
       <tr><th>Number <th colspan=2>Unicode character
      <tbody>
+      <tr><td>0x00 <td>U+FFFD <td>REPLACEMENT CHARACTER
       <tr><td>0x0D <td>U+000A <td>LINE FEED (LF)
       <tr><td>0x80 <td>U+20AC <td>EURO SIGN ('&#x20AC;')
-      <tr><td>0x81 <td>U+FFFD <td>REPLACEMENT CHARACTER
+      <tr><td>0x81 <td>U+0081 <td><control>
       <tr><td>0x82 <td>U+201A <td>SINGLE LOW-9 QUOTATION MARK ('&#x201A;')
       <tr><td>0x83 <td>U+0192 <td>LATIN SMALL LETTER F WITH HOOK ('&#x0192;')
       <tr><td>0x84 <td>U+201E <td>DOUBLE LOW-9 QUOTATION MARK ('&#x201E;')
@@ -75713,10 +75714,10 @@
       <tr><td>0x8A <td>U+0160 <td>LATIN CAPITAL LETTER S WITH CARON ('&#x0160;')
       <tr><td>0x8B <td>U+2039 <td>SINGLE LEFT-POINTING ANGLE QUOTATION MARK ('&#x2039;')
       <tr><td>0x8C <td>U+0152 <td>LATIN CAPITAL LIGATURE OE ('&#x0152;')
-      <tr><td>0x8D <td>U+FFFD <td>REPLACEMENT CHARACTER
+      <tr><td>0x8D <td>U+008D <td><control>
       <tr><td>0x8E <td>U+017D <td>LATIN CAPITAL LETTER Z WITH CARON ('&#x017D;')
-      <tr><td>0x8F <td>U+FFFD <td>REPLACEMENT CHARACTER
-      <tr><td>0x90 <td>U+FFFD <td>REPLACEMENT CHARACTER
+      <tr><td>0x8F <td>U+008F <td><control>
+      <tr><td>0x90 <td>U+0090 <td><control>
       <tr><td>0x91 <td>U+2018 <td>LEFT SINGLE QUOTATION MARK ('&#x2018;')
       <tr><td>0x92 <td>U+2019 <td>RIGHT SINGLE QUOTATION MARK ('&#x2019;')
       <tr><td>0x93 <td>U+201C <td>LEFT DOUBLE QUOTATION MARK ('&#x201C;')
@@ -75729,15 +75730,18 @@
       <tr><td>0x9A <td>U+0161 <td>LATIN SMALL LETTER S WITH CARON ('&#x0161;')
       <tr><td>0x9B <td>U+203A <td>SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ('&#x203A;')
       <tr><td>0x9C <td>U+0153 <td>LATIN SMALL LIGATURE OE ('&#x0153;')
-      <tr><td>0x9D <td>U+FFFD <td>REPLACEMENT CHARACTER
+      <tr><td>0x9D <td>U+009D <td><control>
       <tr><td>0x9E <td>U+017E <td>LATIN SMALL LETTER Z WITH CARON ('&#x017E;')
       <tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('&#x0178;')
     </table>
 
+    <p>Otherwise, return a character token for the Unicode character
+    whose code point is that number.
+
     <!-- this is the same as the equivalent list in the input stream
-    section, except it has 0x0000 included in the first range. -->
-    <p>Otherwise, if the number is in the range 0x0000 to 0x0008, <!--
-    HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF, CR
+    section -->
+    If the number is in the range 0x0001 to 0x0008, <!-- HT, LF
+    allowed --> <!-- U+000B is in the next list --> <!-- FF, CR
     allowed --> 0x000E to 0x001F, <!-- ASCII allowed --> 0x007F <!--to
     0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xD800 to
     0xDFFF<!-- surrogates not allowed -->, 0xFDD0 to 0xFDEF, or is one
@@ -75747,12 +75751,8 @@
     0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE,
     0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or
     0x10FFFF, or is higher than 0x10FFFF, then this is a <span>parse
-    error</span>; return a character token for the U+FFFD REPLACEMENT
-    CHARACTER character instead.</p>
+    error</span>.</p>
 
-    <p>Otherwise, return a character token for the Unicode character
-    whose code point is that number.</p>
-
    </dd>
 
 




More information about the Commit-Watchers mailing list