[html5] r8173 - [c] (0) Define 'control characters' formally. Affected topics: HTML, HTML Syntax [...]

whatwg at whatwg.org whatwg at whatwg.org
Thu Sep 5 14:34:42 PDT 2013


Author: ianh
Date: 2013-09-05 14:34:39 -0700 (Thu, 05 Sep 2013)
New Revision: 8173

Modified:
   complete.html
   index
   source
Log:
[c] (0) Define 'control characters' formally.
Affected topics: HTML, HTML Syntax and Parsing

Modified: complete.html
===================================================================
--- complete.html	2013-09-05 01:25:20 UTC (rev 8172)
+++ complete.html	2013-09-05 21:34:39 UTC (rev 8173)
@@ -256,7 +256,7 @@
 
   <header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
    <hgroup><h1 class=allcaps>HTML</h1>
-    <h2 class="no-num no-toc">Living Standard — Last Updated 4 September 2013</h2>
+    <h2 class="no-num no-toc">Living Standard — Last Updated 5 September 2013</h2>
    </hgroup><dl><dt><strong>Web developer edition:</strong></dt>
     <dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
     <dt>Multiple-page version:</dt>
@@ -2968,8 +2968,8 @@
   it.</p>
 
   <p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code><a href=#text>Text</a></code> node, or
-  string, means that the length of the text is zero (i.e. not even containing spaces or control
-  characters).</p>
+  string, means that the length of the text is zero (i.e. not even containing spaces or <a href=#control-characters>control
+  characters</a>).</p>
 
 
   <h4 id=scripting-0><span class=secno>2.1.4 </span>Scripting</h4>
@@ -3051,7 +3051,7 @@
 
 
   <h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</h4>
-
+xxxxx
   <p>A <dfn id=encoding title=encoding>character encoding</dfn>, or just <i><a href=#encoding>encoding</a></i> where that is not
   ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
   WHATWG Encoding standard. An <a href=#encoding>encoding</a> has an <dfn id=encoding-name>encoding name</dfn> and one or more
@@ -4159,6 +4159,9 @@
   <p class=note>This should not be confused with the "White_Space" value (abbreviated "WS") of the
   "Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>
 
+  <p>The <dfn id=control-characters>control characters</dfn> are those whose Unicode "General_Category" property has the
+  value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a href=#refsUNICODE>[UNICODE]</a></p>
+
   <p>The <dfn id=uppercase-ascii-letters>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
   LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>
 
@@ -10676,7 +10679,7 @@
 
   <p><code><a href=#text>Text</a></code> nodes and attribute values must consist of <a href=#unicode-character title="Unicode
   character">Unicode characters</a>, must not contain U+0000 characters, must not contain
-  permanently undefined Unicode characters (noncharacters), and must not contain control characters
+  permanently undefined Unicode characters (noncharacters), and must not contain <a href=#control-characters>control characters</a>
   other than <a href=#space-character title="space character">space characters</a>.
 
   <!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
@@ -84882,8 +84885,8 @@
   <p>Attributes have a name and a value. <dfn id=syntax-attribute-name title=syntax-attribute-name>Attribute names</dfn>
   must consist of one or more characters other than the <a href=#space-character title="space character">space
   characters</a>, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E
-  GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
-  characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
+  GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <a href=#control-characters>control
+  characters</a>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
   names, even those for <a href=#foreign-elements>foreign elements</a>, may be written with any mix of lower- and
   uppercase letters that are an <a href=#ascii-case-insensitive>ASCII case-insensitive</a> match for the attribute's
   name.</p>
@@ -85280,7 +85283,7 @@
 
   </dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
   point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
-  control characters other than <a href=#space-character title="space character">space characters</a>.</p>
+  <a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>
 
   <p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
   character (&) that is followed by one or more <a href=#alphanumeric-ascii-characters>alphanumeric ASCII characters</a>,
@@ -86373,7 +86376,7 @@
   U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
   U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
   U+10FFFE, and U+10FFFF are <a href=#parse-error title="parse error">parse
-  errors</a>. These are all control characters or permanently
+  errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
   undefined Unicode characters (noncharacters).</p>
 
   <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)

Modified: index
===================================================================
--- index	2013-09-05 01:25:20 UTC (rev 8172)
+++ index	2013-09-05 21:34:39 UTC (rev 8173)
@@ -256,7 +256,7 @@
 
   <header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
    <hgroup><h1 class=allcaps>HTML</h1>
-    <h2 class="no-num no-toc">Living Standard — Last Updated 4 September 2013</h2>
+    <h2 class="no-num no-toc">Living Standard — Last Updated 5 September 2013</h2>
    </hgroup><dl><dt><strong>Web developer edition:</strong></dt>
     <dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
     <dt>Multiple-page version:</dt>
@@ -2968,8 +2968,8 @@
   it.</p>
 
   <p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code><a href=#text>Text</a></code> node, or
-  string, means that the length of the text is zero (i.e. not even containing spaces or control
-  characters).</p>
+  string, means that the length of the text is zero (i.e. not even containing spaces or <a href=#control-characters>control
+  characters</a>).</p>
 
 
   <h4 id=scripting-0><span class=secno>2.1.4 </span>Scripting</h4>
@@ -3051,7 +3051,7 @@
 
 
   <h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</h4>
-
+xxxxx
   <p>A <dfn id=encoding title=encoding>character encoding</dfn>, or just <i><a href=#encoding>encoding</a></i> where that is not
   ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
   WHATWG Encoding standard. An <a href=#encoding>encoding</a> has an <dfn id=encoding-name>encoding name</dfn> and one or more
@@ -4159,6 +4159,9 @@
   <p class=note>This should not be confused with the "White_Space" value (abbreviated "WS") of the
   "Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>
 
+  <p>The <dfn id=control-characters>control characters</dfn> are those whose Unicode "General_Category" property has the
+  value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a href=#refsUNICODE>[UNICODE]</a></p>
+
   <p>The <dfn id=uppercase-ascii-letters>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
   LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>
 
@@ -10676,7 +10679,7 @@
 
   <p><code><a href=#text>Text</a></code> nodes and attribute values must consist of <a href=#unicode-character title="Unicode
   character">Unicode characters</a>, must not contain U+0000 characters, must not contain
-  permanently undefined Unicode characters (noncharacters), and must not contain control characters
+  permanently undefined Unicode characters (noncharacters), and must not contain <a href=#control-characters>control characters</a>
   other than <a href=#space-character title="space character">space characters</a>.
 
   <!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
@@ -84882,8 +84885,8 @@
   <p>Attributes have a name and a value. <dfn id=syntax-attribute-name title=syntax-attribute-name>Attribute names</dfn>
   must consist of one or more characters other than the <a href=#space-character title="space character">space
   characters</a>, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E
-  GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
-  characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
+  GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <a href=#control-characters>control
+  characters</a>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
   names, even those for <a href=#foreign-elements>foreign elements</a>, may be written with any mix of lower- and
   uppercase letters that are an <a href=#ascii-case-insensitive>ASCII case-insensitive</a> match for the attribute's
   name.</p>
@@ -85280,7 +85283,7 @@
 
   </dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
   point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
-  control characters other than <a href=#space-character title="space character">space characters</a>.</p>
+  <a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>
 
   <p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
   character (&) that is followed by one or more <a href=#alphanumeric-ascii-characters>alphanumeric ASCII characters</a>,
@@ -86373,7 +86376,7 @@
   U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
   U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
   U+10FFFE, and U+10FFFF are <a href=#parse-error title="parse error">parse
-  errors</a>. These are all control characters or permanently
+  errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
   undefined Unicode characters (noncharacters).</p>
 
   <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)

Modified: source
===================================================================
--- source	2013-09-05 01:25:20 UTC (rev 8172)
+++ source	2013-09-05 21:34:39 UTC (rev 8173)
@@ -1726,8 +1726,8 @@
   it.</p>
 
   <p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code>Text</code> node, or
-  string, means that the length of the text is zero (i.e. not even containing spaces or control
-  characters).</p>
+  string, means that the length of the text is zero (i.e. not even containing spaces or <span>control
+  characters</span>).</p>
 
 
   <h4>Scripting</h4>
@@ -1811,7 +1811,7 @@
 
 
   <h4 id="encoding-terminology">Character encodings</h4>
-
+xxxxx
   <p>A <dfn title="encoding">character encoding</dfn>, or just <i>encoding</i> where that is not
   ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
   WHATWG Encoding standard. An <span>encoding</span> has an <dfn>encoding name</dfn> and one or more
@@ -3041,6 +3041,10 @@
   <p class="note">This should not be confused with the "White_Space" value (abbreviated "WS") of the
   "Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>
 
+  <p>The <dfn>control characters</dfn> are those whose Unicode "General_Category" property has the
+  value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a
+  href="#refsUNICODE">[UNICODE]</a></p>
+
   <p>The <dfn>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
   LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>
 
@@ -10695,7 +10699,7 @@
 
   <p><code>Text</code> nodes and attribute values must consist of <span title="Unicode
   character">Unicode characters</span>, must not contain U+0000 characters, must not contain
-  permanently undefined Unicode characters (noncharacters), and must not contain control characters
+  permanently undefined Unicode characters (noncharacters), and must not contain <span>control characters</span>
   other than <span title="space character">space characters</span>.
 
   <!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
@@ -94715,8 +94719,8 @@
   <p>Attributes have a name and a value. <dfn title="syntax-attribute-name">Attribute names</dfn>
   must consist of one or more characters other than the <span title="space character">space
   characters</span>, U+0000 NULL, U+0022 QUOTATION MARK (&#x22;), U+0027 APOSTROPHE (&#x27;), U+003E
-  GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
-  characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
+  GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <span>control
+  characters</span>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
   names, even those for <span>foreign elements</span>, may be written with any mix of lower- and
   uppercase letters that are an <span>ASCII case-insensitive</span> match for the attribute's
   name.</p>
@@ -95144,7 +95148,7 @@
 
   <p>The numeric character reference forms described above are allowed to reference any Unicode code
   point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
-  control characters other than <span title="space character">space characters</span>.</p>
+  <span>control characters</span> other than <span title="space character">space characters</span>.</p>
 
   <p>An <dfn title="syntax-ambiguous-ampersand">ambiguous ampersand</dfn> is a U+0026 AMPERSAND
   character (&) that is followed by one or more <span>alphanumeric ASCII characters</span>,
@@ -96389,7 +96393,7 @@
   U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
   U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
   U+10FFFE, and U+10FFFF are <span title="parse error">parse
-  errors</span>. These are all control characters or permanently
+  errors</span>. These are all <span>control characters</span> or permanently
   undefined Unicode characters (noncharacters).</p>
 
   <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)




More information about the Commit-Watchers mailing list