[html5] r8173 - [c] (0) Define 'control characters' formally. Affected topics: HTML, HTML Syntax [...]
whatwg at whatwg.org
whatwg at whatwg.org
Thu Sep 5 14:34:42 PDT 2013
Author: ianh
Date: 2013-09-05 14:34:39 -0700 (Thu, 05 Sep 2013)
New Revision: 8173
Modified:
complete.html
index
source
Log:
[c] (0) Define 'control characters' formally.
Affected topics: HTML, HTML Syntax and Parsing
Modified: complete.html
===================================================================
--- complete.html 2013-09-05 01:25:20 UTC (rev 8172)
+++ complete.html 2013-09-05 21:34:39 UTC (rev 8173)
@@ -256,7 +256,7 @@
<header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1>
- <h2 class="no-num no-toc">Living Standard — Last Updated 4 September 2013</h2>
+ <h2 class="no-num no-toc">Living Standard — Last Updated 5 September 2013</h2>
</hgroup><dl><dt><strong>Web developer edition:</strong></dt>
<dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
<dt>Multiple-page version:</dt>
@@ -2968,8 +2968,8 @@
it.</p>
<p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code><a href=#text>Text</a></code> node, or
- string, means that the length of the text is zero (i.e. not even containing spaces or control
- characters).</p>
+ string, means that the length of the text is zero (i.e. not even containing spaces or <a href=#control-characters>control
+ characters</a>).</p>
<h4 id=scripting-0><span class=secno>2.1.4 </span>Scripting</h4>
@@ -3051,7 +3051,7 @@
<h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</h4>
-
+xxxxx
<p>A <dfn id=encoding title=encoding>character encoding</dfn>, or just <i><a href=#encoding>encoding</a></i> where that is not
ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
WHATWG Encoding standard. An <a href=#encoding>encoding</a> has an <dfn id=encoding-name>encoding name</dfn> and one or more
@@ -4159,6 +4159,9 @@
<p class=note>This should not be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>
+ <p>The <dfn id=control-characters>control characters</dfn> are those whose Unicode "General_Category" property has the
+ value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a href=#refsUNICODE>[UNICODE]</a></p>
+
<p>The <dfn id=uppercase-ascii-letters>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>
@@ -10676,7 +10679,7 @@
<p><code><a href=#text>Text</a></code> nodes and attribute values must consist of <a href=#unicode-character title="Unicode
character">Unicode characters</a>, must not contain U+0000 characters, must not contain
- permanently undefined Unicode characters (noncharacters), and must not contain control characters
+ permanently undefined Unicode characters (noncharacters), and must not contain <a href=#control-characters>control characters</a>
other than <a href=#space-character title="space character">space characters</a>.
<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
@@ -84882,8 +84885,8 @@
<p>Attributes have a name and a value. <dfn id=syntax-attribute-name title=syntax-attribute-name>Attribute names</dfn>
must consist of one or more characters other than the <a href=#space-character title="space character">space
characters</a>, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E
- GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
- characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
+ GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <a href=#control-characters>control
+ characters</a>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
names, even those for <a href=#foreign-elements>foreign elements</a>, may be written with any mix of lower- and
uppercase letters that are an <a href=#ascii-case-insensitive>ASCII case-insensitive</a> match for the attribute's
name.</p>
@@ -85280,7 +85283,7 @@
</dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
- control characters other than <a href=#space-character title="space character">space characters</a>.</p>
+ <a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>
<p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&) that is followed by one or more <a href=#alphanumeric-ascii-characters>alphanumeric ASCII characters</a>,
@@ -86373,7 +86376,7 @@
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <a href=#parse-error title="parse error">parse
- errors</a>. These are all control characters or permanently
+ errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
undefined Unicode characters (noncharacters).</p>
<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
Modified: index
===================================================================
--- index 2013-09-05 01:25:20 UTC (rev 8172)
+++ index 2013-09-05 21:34:39 UTC (rev 8173)
@@ -256,7 +256,7 @@
<header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1>
- <h2 class="no-num no-toc">Living Standard — Last Updated 4 September 2013</h2>
+ <h2 class="no-num no-toc">Living Standard — Last Updated 5 September 2013</h2>
</hgroup><dl><dt><strong>Web developer edition:</strong></dt>
<dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
<dt>Multiple-page version:</dt>
@@ -2968,8 +2968,8 @@
it.</p>
<p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code><a href=#text>Text</a></code> node, or
- string, means that the length of the text is zero (i.e. not even containing spaces or control
- characters).</p>
+ string, means that the length of the text is zero (i.e. not even containing spaces or <a href=#control-characters>control
+ characters</a>).</p>
<h4 id=scripting-0><span class=secno>2.1.4 </span>Scripting</h4>
@@ -3051,7 +3051,7 @@
<h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</h4>
-
+xxxxx
<p>A <dfn id=encoding title=encoding>character encoding</dfn>, or just <i><a href=#encoding>encoding</a></i> where that is not
ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
WHATWG Encoding standard. An <a href=#encoding>encoding</a> has an <dfn id=encoding-name>encoding name</dfn> and one or more
@@ -4159,6 +4159,9 @@
<p class=note>This should not be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>
+ <p>The <dfn id=control-characters>control characters</dfn> are those whose Unicode "General_Category" property has the
+ value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a href=#refsUNICODE>[UNICODE]</a></p>
+
<p>The <dfn id=uppercase-ascii-letters>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>
@@ -10676,7 +10679,7 @@
<p><code><a href=#text>Text</a></code> nodes and attribute values must consist of <a href=#unicode-character title="Unicode
character">Unicode characters</a>, must not contain U+0000 characters, must not contain
- permanently undefined Unicode characters (noncharacters), and must not contain control characters
+ permanently undefined Unicode characters (noncharacters), and must not contain <a href=#control-characters>control characters</a>
other than <a href=#space-character title="space character">space characters</a>.
<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
@@ -84882,8 +84885,8 @@
<p>Attributes have a name and a value. <dfn id=syntax-attribute-name title=syntax-attribute-name>Attribute names</dfn>
must consist of one or more characters other than the <a href=#space-character title="space character">space
characters</a>, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E
- GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
- characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
+ GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <a href=#control-characters>control
+ characters</a>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
names, even those for <a href=#foreign-elements>foreign elements</a>, may be written with any mix of lower- and
uppercase letters that are an <a href=#ascii-case-insensitive>ASCII case-insensitive</a> match for the attribute's
name.</p>
@@ -85280,7 +85283,7 @@
</dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
- control characters other than <a href=#space-character title="space character">space characters</a>.</p>
+ <a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>
<p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&) that is followed by one or more <a href=#alphanumeric-ascii-characters>alphanumeric ASCII characters</a>,
@@ -86373,7 +86376,7 @@
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <a href=#parse-error title="parse error">parse
- errors</a>. These are all control characters or permanently
+ errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
undefined Unicode characters (noncharacters).</p>
<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
Modified: source
===================================================================
--- source 2013-09-05 01:25:20 UTC (rev 8172)
+++ source 2013-09-05 21:34:39 UTC (rev 8173)
@@ -1726,8 +1726,8 @@
it.</p>
<p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code>Text</code> node, or
- string, means that the length of the text is zero (i.e. not even containing spaces or control
- characters).</p>
+ string, means that the length of the text is zero (i.e. not even containing spaces or <span>control
+ characters</span>).</p>
<h4>Scripting</h4>
@@ -1811,7 +1811,7 @@
<h4 id="encoding-terminology">Character encodings</h4>
-
+xxxxx
<p>A <dfn title="encoding">character encoding</dfn>, or just <i>encoding</i> where that is not
ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
WHATWG Encoding standard. An <span>encoding</span> has an <dfn>encoding name</dfn> and one or more
@@ -3041,6 +3041,10 @@
<p class="note">This should not be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>
+ <p>The <dfn>control characters</dfn> are those whose Unicode "General_Category" property has the
+ value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a
+ href="#refsUNICODE">[UNICODE]</a></p>
+
<p>The <dfn>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>
@@ -10695,7 +10699,7 @@
<p><code>Text</code> nodes and attribute values must consist of <span title="Unicode
character">Unicode characters</span>, must not contain U+0000 characters, must not contain
- permanently undefined Unicode characters (noncharacters), and must not contain control characters
+ permanently undefined Unicode characters (noncharacters), and must not contain <span>control characters</span>
other than <span title="space character">space characters</span>.
<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
@@ -94715,8 +94719,8 @@
<p>Attributes have a name and a value. <dfn title="syntax-attribute-name">Attribute names</dfn>
must consist of one or more characters other than the <span title="space character">space
characters</span>, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E
- GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
- characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
+ GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <span>control
+ characters</span>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
names, even those for <span>foreign elements</span>, may be written with any mix of lower- and
uppercase letters that are an <span>ASCII case-insensitive</span> match for the attribute's
name.</p>
@@ -95144,7 +95148,7 @@
<p>The numeric character reference forms described above are allowed to reference any Unicode code
point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
- control characters other than <span title="space character">space characters</span>.</p>
+ <span>control characters</span> other than <span title="space character">space characters</span>.</p>
<p>An <dfn title="syntax-ambiguous-ampersand">ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&) that is followed by one or more <span>alphanumeric ASCII characters</span>,
@@ -96389,7 +96393,7 @@
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <span title="parse error">parse
- errors</span>. These are all control characters or permanently
+ errors</span>. These are all <span>control characters</span> or permanently
undefined Unicode characters (noncharacters).</p>
<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
More information about the Commit-Watchers
mailing list