[html5] r1263 - /

whatwg at whatwg.org whatwg at whatwg.org
Wed Feb 27 12:43:42 PST 2008


Author: ianh
Date: 2008-02-27 12:43:37 -0800 (Wed, 27 Feb 2008)
New Revision: 1263

Modified:
   index
   source
Log:
[ac] (1) Make control characters and non-Unicode characters be parse errors, for compatibility with XML.

Modified: index
===================================================================
--- index	2008-02-27 19:35:50 UTC (rev 1262)
+++ index	2008-02-27 20:43:37 UTC (rev 1263)
@@ -37154,7 +37154,9 @@
    href="#charset">character encoding declarations</a> are to be serialised,
    as discussed in the section on that topic.
 
-  <p>The U+0000 NULL character must not appear anywhere in a document.
+  <p>The U+0000 NULL character, control characters other than the <a
+   href="#space" title="space character">space characters</a>, and characters
+   that are not defined by Unicode, must not appear anywhere in a document.
 
   <p class=note>Space characters before the root <code><a
    href="#html">html</a></code> element will be dropped when the document is
@@ -38428,6 +38430,21 @@
    REPLACEMENT CHARACTERs. Any occurrences of such characters is a <a
    href="#parse0">parse error</a>.
 
+  <p>Any occurances of any characters in the ranges U+0001 to U+0008,
+   <!-- space characters allowed --> U+000E to U+001F, <!-- ASCII
+  allowed -->
+   U+007F <!--to U+0084, (U+0085 NEL not allowed),
+  U+0086--> to U+009F,
+   U+D800 to U+DFFF <!-- surrogates not allowed
+  -->, U+FDD0 to U+FDDF, and
+   characters U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE,
+   U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE,
+   U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE,
+   U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE,
+   U+FFFFF, U+10FFFE, and U+10FFFF are <a href="#parse0" title="parse
+   error">parse errors</a>. (These are all control characters or permanently
+   undefined Unicode characters.)
+
   <p>U+000D CARRIAGE RETURN (CR) characters, and U+000A LINE FEED (LF)
    characters, are treated specially. Any CR characters that are followed by
    LF characters must be removed, and any CR characters not followed by LF

Modified: source
===================================================================
--- source	2008-02-27 19:35:50 UTC (rev 1262)
+++ source	2008-02-27 20:43:37 UTC (rev 1263)
@@ -34677,8 +34677,10 @@
   href="#charset">character encoding declarations</a> are to be
   serialised, as discussed in the section on that topic.</p>
 
-  <p>The U+0000 NULL character must not appear anywhere in a
-  document.</p>
+  <p>The U+0000 NULL character, control characters other than the
+  <span title="space character">space characters</span>, and
+  characters that are not defined by Unicode, must not appear anywhere
+  in a document.</p>
 
   <p class="note">Space characters before the root <code>html</code>
   element will be dropped when the document is parsed; space
@@ -35997,6 +35999,19 @@
   U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is
   a <span>parse error</span>.</p>
 
+  <p>Any occurances of any characters in the ranges U+0001 to U+0008,
+  <!-- space characters allowed --> U+000E to U+001F, <!-- ASCII
+  allowed --> U+007F <!--to U+0084, (U+0085 NEL not allowed),
+  U+0086--> to U+009F, U+D800 to U+DFFF <!-- surrogates not allowed
+  -->, U+FDD0 to U+FDDF, and characters U+FFFE, U+FFFF, U+1FFFE,
+  U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF,
+  U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE,
+  U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF,
+  U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE,
+  U+FFFFF, U+10FFFE, and U+10FFFF are <span title="parse error">parse
+  errors</span>. (These are all control characters or permanently
+  undefined Unicode characters.)</p>
+
   <p>U+000D CARRIAGE RETURN (CR) characters, and U+000A LINE FEED (LF)
   characters, are treated specially. Any CR characters that are
   followed by LF characters must be removed, and any CR characters not




More information about the Commit-Watchers mailing list