[html5] r2138 - [ct] (0) Make U+000B into a parse error and have it convert to U+FFFD in NCRs. ( [...]

Tue Sep 2 00:25:11 PDT 2008

Author: ianh
Date: 2008-09-02 00:25:09 -0700 (Tue, 02 Sep 2008)
New Revision: 2138

Modified:
   index
   source
Log:
[ct] (0) Make U+000B into a parse error and have it convert to U+FFFD in NCRs. (credit: hs)

Modified: index
===================================================================

--- index	2008-09-02 07:09:35 UTC (rev 2137)
+++ index	2008-09-02 07:25:09 UTC (rev 2138)
@@ -46824,22 +46824,21 @@
    href="#parse2">parse error</a>.
 
   <p>Any occurrences of any characters in the ranges U+0001 to U+0008,
-   <!-- space characters allowed --> U+000E to U+001F, <!-- ASCII
-  allowed -->
-   U+007F <!--to U+0084, (U+0085 NEL not allowed),
-  U+0086--> to U+009F,
-   U+D800 to U+DFFF <!-- surrogates not allowed
-  -->, U+FDD0 to U+FDDF, and
-   characters U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE,
-   U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE,
-   U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE,
-   U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE,
-   U+FFFFF, U+10FFFE, and U+10FFFF are <a href="#parse2" title="parse
-   error">parse errors</a>. (These are all control characters or permanently
-   undefined Unicode characters.)
+   <!-- HT, LF allowed --> U+000B, <!-- FF, CR allowed --> U+000E to U+001F,
+   <!-- ASCII allowed --> U+007F <!--to U+0084, (U+0085 NEL not
+  allowed), U+0086-->
+   to U+009F, U+D800 to U+DFFF <!-- surrogates not
+  allowed -->, U+FDD0 to
+   U+FDDF, and characters U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF,
+   U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF,
+   U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF,
+   U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF,
+   U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are <a href="#parse2"
+   title="parse error">parse errors</a>. (These are all control characters or
+   permanently undefined Unicode characters.)
 
-  <p>U+000D CARRIAGE RETURN (CR) characters, and U+000A LINE FEED (LF)
-   characters, are treated specially. Any CR characters that are followed by
+  <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
+   characters are treated specially. Any CR characters that are followed by
    LF characters must be removed, and any CR characters not followed by LF
    characters must be converted to LF characters. Thus, newlines in HTML DOMs
    are represented by LF characters, and there are never any CR characters in
@@ -49140,18 +49139,19 @@
     section, except it has 0x0000 included in the first range. -->
     
     <p>Otherwise, if the number is in the range 0x0000 to 0x0008, <!--
-    space characters allowed -->
-     0x000E to 0x001F, <!-- ASCII allowed
-    --> 0x007F
-     <!--to 0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xD800 to
-     0xDFFF <!-- surrogates not allowed -->, 0xFDD0 to 0xFDDF, or is one of
-     0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF,
-     0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF,
-     0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF,
-     0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF,
-     0x10FFFE, or 0x10FFFF, or is higher than 0x10FFFF, then this is a <a
-     href="#parse2">parse error</a>; return a character token for the U+FFFD
-     REPLACEMENT CHARACTER character instead.</p>
+    HT, LF allowed -->
+     U+000B, <!-- FF, CR allowed --> U+000E to 0x001F, <!-- ASCII allowed -->
+     0x007F <!--to 0x0084, (0x0085 NEL
+    not allowed), 0x0086--> to 0x009F,
+     0xD800 to 0xDFFF <!--
+    surrogates not allowed -->, 0xFDD0 to 0xFDDF,
+     or is one of 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF,
+     0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE, 0x6FFFF,
+     0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF,
+     0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF,
+     0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF, or is higher than 0x10FFFF,
+     then this is a <a href="#parse2">parse error</a>; return a character
+     token for the U+FFFD REPLACEMENT CHARACTER character instead.</p>
 
     <p>Otherwise, return a character token for the Unicode character whose
      code point is that number.</p>

Modified: source
===================================================================
--- source	2008-09-02 07:09:35 UTC (rev 2137)
+++ source	2008-09-02 07:25:09 UTC (rev 2138)
@@ -44209,20 +44209,20 @@
   a <span>parse error</span>.</p>
 
   <p>Any occurrences of any characters in the ranges U+0001 to U+0008,
-  <!-- space characters allowed --> U+000E to U+001F, <!-- ASCII
-  allowed --> U+007F <!--to U+0084, (U+0085 NEL not allowed),
-  U+0086--> to U+009F, U+D800 to U+DFFF <!-- surrogates not allowed
-  -->, U+FDD0 to U+FDDF, and characters U+FFFE, U+FFFF, U+1FFFE,
-  U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF,
-  U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE,
-  U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF,
-  U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE,
-  U+FFFFF, U+10FFFE, and U+10FFFF are <span title="parse error">parse
-  errors</span>. (These are all control characters or permanently
-  undefined Unicode characters.)</p>
+  <!-- HT, LF allowed --> U+000B, <!-- FF, CR allowed --> U+000E to
+  U+001F, <!-- ASCII allowed --> U+007F <!--to U+0084, (U+0085 NEL not
+  allowed), U+0086--> to U+009F, U+D800 to U+DFFF <!-- surrogates not
+  allowed -->, U+FDD0 to U+FDDF, and characters U+FFFE, U+FFFF,
+  U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE,
+  U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF,
+  U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE,
+  U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF,
+  U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are <span title="parse
+  error">parse errors</span>. (These are all control characters or
+  permanently undefined Unicode characters.)</p>
 
-  <p>U+000D CARRIAGE RETURN (CR) characters, and U+000A LINE FEED (LF)
-  characters, are treated specially. Any CR characters that are
+  <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
+  characters are treated specially. Any CR characters that are
   followed by LF characters must be removed, and any CR characters not
   followed by LF characters must be converted to LF characters. Thus,
   newlines in HTML DOMs are represented by LF characters, and there
@@ -46185,17 +46185,18 @@
     <!-- this is the same as the equivalent list in the input stream
     section, except it has 0x0000 included in the first range. -->
     <p>Otherwise, if the number is in the range 0x0000 to 0x0008, <!--
-    space characters allowed --> 0x000E to 0x001F, <!-- ASCII allowed
-    --> 0x007F <!--to 0x0084, (0x0085 NEL not allowed), 0x0086--> to
-    0x009F, 0xD800 to 0xDFFF <!-- surrogates not allowed -->, 0xFDD0
-    to 0xFDDF, or is one of 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE,
-    0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF,
-    0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE,
-    0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF,
-    0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or
-    0x10FFFF, or is higher than 0x10FFFF, then this is a <span>parse
-    error</span>; return a character token for the U+FFFD REPLACEMENT
-    CHARACTER character instead.</p>
+    HT, LF allowed --> U+000B, <!-- FF, CR allowed --> U+000E to
+    0x001F, <!-- ASCII allowed --> 0x007F <!--to 0x0084, (0x0085 NEL
+    not allowed), 0x0086--> to 0x009F, 0xD800 to 0xDFFF <!--
+    surrogates not allowed -->, 0xFDD0 to 0xFDDF, or is one of 0xFFFE,
+    0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF,
+    0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE,
+    0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF,
+    0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE,
+    0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF, or is higher
+    than 0x10FFFF, then this is a <span>parse error</span>; return a
+    character token for the U+FFFD REPLACEMENT CHARACTER character
+    instead.</p>
 
     <p>Otherwise, return a character token for the Unicode character
     whose code point is that number.</p>