[html5] r6184 - [giow] (0) Try to clean up the stuff about Unicode characters. Fixing http://www [...]
whatwg at whatwg.org
whatwg at whatwg.org
Fri Jun 3 12:40:11 PDT 2011
Author: ianh
Date: 2011-06-03 12:40:10 -0700 (Fri, 03 Jun 2011)
New Revision: 6184
Modified:
complete.html
index
source
Log:
[giow] (0) Try to clean up the stuff about Unicode characters.
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=12100
Modified: complete.html
===================================================================
--- complete.html 2011-06-03 01:21:42 UTC (rev 6183)
+++ complete.html 2011-06-03 19:40:10 UTC (rev 6184)
@@ -2944,9 +2944,8 @@
different <meta charset> elements applying in each case.
-->
- <p>The term <dfn title="">Unicode character</dfn> is used to mean a
- <i title="">Unicode scalar value</i> (i.e. any Unicode code point
- that is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>
+ <p>The term <dfn id=unicode-character>Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
+ is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>
@@ -3425,14 +3424,6 @@
is passed an Infinity or Not-a-Number (NaN) value, a
<code><a href=#not_supported_err>NOT_SUPPORTED_ERR</a></code> exception must be raised.</p>
- <p>Except where otherwise specified, if a method has an argument
- of type <code>DOMString</code>, or if an IDL attribute is assigned
- a new value of type <code>DOMString</code>, the user agent must
- <span title=dfn-obtain-unicode>convert the
- <code>DOMString</code> to a sequence of Unicode characters</span>
- to obtain the string on which the algorithms in this specification
- are to operate. <a href=#refsWEBIDL>[WEBIDL]</a></p>
-
</dd>
<dt>JavaScript</dt>
@@ -6380,7 +6371,9 @@
characters as defined by UTF-8.</p>
<p>If any percent-encoded octets in that component are not valid
- UTF-8 sequences, then return an error and abort these steps.</p>
+ UTF-8 sequences (e.g. sequences of percent-encoded octets that
+ expand to surrogate code points), then return an error and abort
+ these steps.</p>
<p>Apply the IDNA ToASCII algorithm to the matching substring,
with both the AllowUnassigned and UseSTD3ASCIIRules flags
@@ -16096,11 +16089,11 @@
<dd>
- <p>The contents of that file, interpreted as string of
- Unicode characters, are the script source.</p>
+ <p>The contents of that file, interpreted as a Unicode
+ string, are the script source.</p>
- <p>To obtain the string of Unicode characters, the user
- agent run the following steps:</p>
+ <p>To obtain the Unicode string, the user agent run the
+ following steps:</p>
<ol><li><p>If the resource's <a href=#content-type title=Content-Type>Content
Type metadata</a>, if any, specifies a character
@@ -16471,11 +16464,11 @@
star = %x002A ; U+002A ASTERISK (*)
slash = %x002F ; U+002F SOLIDUS (/)
not-newline = %x0000-0009 / %x000B-10FFFF
- ; a Unicode character other than U+000A LINE FEED (LF)
+ ; a <a href=#unicode-character>Unicode character</a> other than U+000A LINE FEED (LF)
not-star = %x0000-0029 / %x002B-10FFFF
- ; a Unicode character other than U+002A ASTERISK (*)
+ ; a <a href=#unicode-character>Unicode character</a> other than U+002A ASTERISK (*)
not-slash = %x0000-002E / %x0030-10FFFF
- ; a Unicode character other than U+002F SOLIDUS (/)</pre>
+ ; a <a href=#unicode-character>Unicode character</a> other than U+002F SOLIDUS (/)</pre>
<p class=note>This corresponds to putting the contents of the
element in JavaScript comments.</p>
@@ -32310,18 +32303,13 @@
parsing the provided byte stream. If the stream lacks this WebVTT
file signature, then the parser aborts.</p>
- <p>When converting the bytes into Unicode characters, if the
- encoding used is UTF-8, the bytes must be <a href=#decoded-as-utf-8,-with-error-handling title="decoded as
- UTF-8, with error handling">decoded with the error handling</a>
- defined in this specification, and all U+0000 NULL characters must
- be replaced by U+FFFD REPLACEMENT CHARACTERs.</p>
-
<p>The <dfn id=webvtt-parser-algorithm>WebVTT parser algorithm</dfn> is as follows:</p>
<ol><li><p>Let <var title="">input</var> be the string being parsed,
- after conversion to Unicode and after the replacement of U+0000
- NULL characters described above.</li>
+ after conversion to Unicode.</li>
+ <li><p>Replace all U+0000 NULL characters in <var title="">input</var> by U+FFFD REPLACEMENT CHARACTERs.</li>
+
<li><p>Let <var title="">position</var> be a pointer into <var title="">input</var>, initially pointing at the start of the
string. In an <a href=#incremental-webvtt-parser>incremental WebVTT parser</a>, when this
algorithm (or further algorithms that it uses) moves the <var title="">position</var> pointer, the user agent must wait until
@@ -64072,14 +64060,14 @@
<li><p>Let <var title="">decoded fragid</var> be the result of
expanding any sequences of percent-encoded octets in <var title="">fragid</var> that are valid UTF-8 sequences into Unicode
characters as defined by UTF-8. If any percent-encoded octets in
- that string are not valid UTF-8 sequences, then skip this step and
- the next one.</p>
+ that string are not valid UTF-8 sequences (e.g. they expand to
+ surrogate code points), then skip this step and the next one.</p>
<li><p>If this step was not skipped and there is an element in the
- DOM that has an <a href=#concept-id title=concept-id>ID</a> exactly equal to <var title="">decoded
- fragid</var>, then the first such element in tree order is
- <a href=#the-indicated-part-of-the-document>the indicated part of the document</a>; stop the algorithm
- here.</li>
+ DOM that has an <a href=#concept-id title=concept-id>ID</a> exactly equal to
+ <var title="">decoded fragid</var>, then the first such element in
+ tree order is <a href=#the-indicated-part-of-the-document>the indicated part of the document</a>; stop
+ the algorithm here.</li>
<li><p>If there is an <code><a href=#the-a-element>a</a></code> element in the DOM that has a
<code title=attr-a-name><a href=#attr-a-name>name</a></code> attribute whose value is
@@ -78565,9 +78553,9 @@
colon = %x003A ; U+003A COLON (:)
bom = %xFEFF ; U+FEFF BYTE ORDER MARK
name-char = %x0000-0009 / %x000B-000C / %x000E-0039 / %x003B-10FFFF
- ; a Unicode character other than U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR), or U+003A COLON (:)
+ ; a <a href=#unicode-character>Unicode character</a> other than U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR), or U+003A COLON (:)
any-char = %x0000-0009 / %x000B-000C / %x000E-10FFFF
- ; a Unicode character other than U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)</pre>
+ ; a <a href=#unicode-character>Unicode character</a> other than U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)</pre>
<p>Event streams in this format must always be encoded as
UTF-8. <a href=#refsRFC3629>[RFC3629]</a></p>
@@ -81820,12 +81808,13 @@
<h4 id=text-1><span class=secno>13.1.3 </span>Text</h4>
<p><dfn id=syntax-text title=syntax-text>Text</dfn> is allowed inside elements,
- attribute values, and comments. Text must consist of Unicode
- characters. Text must not contain U+0000 characters. Text must not
- contain permanently undefined Unicode characters (noncharacters).
- Text must not contain control characters other than <a href=#space-character title="space character">space characters</a>. Extra constraints
- are placed on what is and what is not allowed in text based on where
- the text is to be put, as described in the other sections.</p>
+ attribute values, and comments. Text must consist of <a href=#unicode-character title="Unicode character">Unicode characters</a>. Text must not
+ contain U+0000 characters. Text must not contain permanently
+ undefined Unicode characters (noncharacters). Text must not contain
+ control characters other than <a href=#space-character title="space character">space
+ characters</a>. Extra constraints are placed on what is and what
+ is not allowed in text based on where the text is to be put, as
+ described in the other sections.</p>
<h5 id=newlines><span class=secno>13.1.3.1 </span>Newlines</h5>
@@ -82020,7 +82009,7 @@
<h4 id=overview-of-the-parsing-model><span class=secno>13.2.1 </span>Overview of the parsing model</h4>
<p>The input to the HTML parsing process consists of a stream of
- Unicode characters, which is passed through a
+ Unicode code points, which is passed through a
<a href=#tokenization>tokenization</a> stage followed by a <a href=#tree-construction>tree
construction</a> stage. The output is a <code><a href=#document>Document</a></code>
object.</p>
@@ -82069,7 +82058,7 @@
<h4 id=the-input-stream><span class=secno>13.2.2 </span>The <dfn>input stream</dfn></h4>
- <p>The stream of Unicode characters that comprises the input to the
+ <p>The stream of Unicode code points that comprises the input to the
tokenization stage will be initially seen by the user agent as a
stream of bytes (typically coming over the network or from the local
file system). The bytes encode the actual characters according to a
@@ -82107,8 +82096,8 @@
that encoding is <i>tentative</i> or <i>certain</i>, is <a href=#meta-charset-during-parse>used during the parsing</a> to
determine whether to <a href=#change-the-encoding>change the encoding</a>. If no
encoding is necessary, e.g. because the parser is operating on a
- stream of Unicode characters and doesn't have to use an encoding at
- all, then the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is
+ Unicode stream and doesn't have to use an encoding at all, then the
+ <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is
<i>irrelevant</i>.</p>
<ol><li><p>If the user has explicitly instructed the user agent to
@@ -82730,7 +82719,7 @@
<h5 id=preprocessing-the-input-stream><span class=secno>13.2.2.3 </span>Preprocessing the input stream</h5>
<p>Given an encoding, the bytes in the input stream must be
- converted to Unicode characters for the tokenizer, as described by
+ converted to Unicode code points for the tokenizer, as described by
the rules for that encoding, except that the leading U+FEFF BYTE
ORDER MARK character, if any, must not be stripped by the encoding
layer (it is stripped by the rule below).</p> <!-- this is to
Modified: index
===================================================================
--- index 2011-06-03 01:21:42 UTC (rev 6183)
+++ index 2011-06-03 19:40:10 UTC (rev 6184)
@@ -2961,9 +2961,8 @@
different <meta charset> elements applying in each case.
-->
- <p>The term <dfn title="">Unicode character</dfn> is used to mean a
- <i title="">Unicode scalar value</i> (i.e. any Unicode code point
- that is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>
+ <p>The term <dfn id=unicode-character>Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
+ is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>
@@ -3442,14 +3441,6 @@
is passed an Infinity or Not-a-Number (NaN) value, a
<code><a href=#not_supported_err>NOT_SUPPORTED_ERR</a></code> exception must be raised.</p>
- <p>Except where otherwise specified, if a method has an argument
- of type <code>DOMString</code>, or if an IDL attribute is assigned
- a new value of type <code>DOMString</code>, the user agent must
- <span title=dfn-obtain-unicode>convert the
- <code>DOMString</code> to a sequence of Unicode characters</span>
- to obtain the string on which the algorithms in this specification
- are to operate. <a href=#refsWEBIDL>[WEBIDL]</a></p>
-
</dd>
<dt>JavaScript</dt>
@@ -6366,7 +6357,9 @@
characters as defined by UTF-8.</p>
<p>If any percent-encoded octets in that component are not valid
- UTF-8 sequences, then return an error and abort these steps.</p>
+ UTF-8 sequences (e.g. sequences of percent-encoded octets that
+ expand to surrogate code points), then return an error and abort
+ these steps.</p>
<p>Apply the IDNA ToASCII algorithm to the matching substring,
with both the AllowUnassigned and UseSTD3ASCIIRules flags
@@ -16082,11 +16075,11 @@
<dd>
- <p>The contents of that file, interpreted as string of
- Unicode characters, are the script source.</p>
+ <p>The contents of that file, interpreted as a Unicode
+ string, are the script source.</p>
- <p>To obtain the string of Unicode characters, the user
- agent run the following steps:</p>
+ <p>To obtain the Unicode string, the user agent run the
+ following steps:</p>
<ol><li><p>If the resource's <a href=#content-type title=Content-Type>Content
Type metadata</a>, if any, specifies a character
@@ -16457,11 +16450,11 @@
star = %x002A ; U+002A ASTERISK (*)
slash = %x002F ; U+002F SOLIDUS (/)
not-newline = %x0000-0009 / %x000B-10FFFF
- ; a Unicode character other than U+000A LINE FEED (LF)
+ ; a <a href=#unicode-character>Unicode character</a> other than U+000A LINE FEED (LF)
not-star = %x0000-0029 / %x002B-10FFFF
- ; a Unicode character other than U+002A ASTERISK (*)
+ ; a <a href=#unicode-character>Unicode character</a> other than U+002A ASTERISK (*)
not-slash = %x0000-002E / %x0030-10FFFF
- ; a Unicode character other than U+002F SOLIDUS (/)</pre>
+ ; a <a href=#unicode-character>Unicode character</a> other than U+002F SOLIDUS (/)</pre>
<p class=note>This corresponds to putting the contents of the
element in JavaScript comments.</p>
@@ -32299,18 +32292,13 @@
parsing the provided byte stream. If the stream lacks this WebVTT
file signature, then the parser aborts.</p>
- <p>When converting the bytes into Unicode characters, if the
- encoding used is UTF-8, the bytes must be <a href=#decoded-as-utf-8,-with-error-handling title="decoded as
- UTF-8, with error handling">decoded with the error handling</a>
- defined in this specification, and all U+0000 NULL characters must
- be replaced by U+FFFD REPLACEMENT CHARACTERs.</p>
-
<p>The <dfn id=webvtt-parser-algorithm>WebVTT parser algorithm</dfn> is as follows:</p>
<ol><li><p>Let <var title="">input</var> be the string being parsed,
- after conversion to Unicode and after the replacement of U+0000
- NULL characters described above.</li>
+ after conversion to Unicode.</li>
+ <li><p>Replace all U+0000 NULL characters in <var title="">input</var> by U+FFFD REPLACEMENT CHARACTERs.</li>
+
<li><p>Let <var title="">position</var> be a pointer into <var title="">input</var>, initially pointing at the start of the
string. In an <a href=#incremental-webvtt-parser>incremental WebVTT parser</a>, when this
algorithm (or further algorithms that it uses) moves the <var title="">position</var> pointer, the user agent must wait until
@@ -64061,14 +64049,14 @@
<li><p>Let <var title="">decoded fragid</var> be the result of
expanding any sequences of percent-encoded octets in <var title="">fragid</var> that are valid UTF-8 sequences into Unicode
characters as defined by UTF-8. If any percent-encoded octets in
- that string are not valid UTF-8 sequences, then skip this step and
- the next one.</p>
+ that string are not valid UTF-8 sequences (e.g. they expand to
+ surrogate code points), then skip this step and the next one.</p>
<li><p>If this step was not skipped and there is an element in the
- DOM that has an <a href=#concept-id title=concept-id>ID</a> exactly equal to <var title="">decoded
- fragid</var>, then the first such element in tree order is
- <a href=#the-indicated-part-of-the-document>the indicated part of the document</a>; stop the algorithm
- here.</li>
+ DOM that has an <a href=#concept-id title=concept-id>ID</a> exactly equal to
+ <var title="">decoded fragid</var>, then the first such element in
+ tree order is <a href=#the-indicated-part-of-the-document>the indicated part of the document</a>; stop
+ the algorithm here.</li>
<li><p>If there is an <code><a href=#the-a-element>a</a></code> element in the DOM that has a
<code title=attr-a-name><a href=#attr-a-name>name</a></code> attribute whose value is
@@ -77566,12 +77554,13 @@
<h4 id=text-1><span class=secno>11.1.3 </span>Text</h4>
<p><dfn id=syntax-text title=syntax-text>Text</dfn> is allowed inside elements,
- attribute values, and comments. Text must consist of Unicode
- characters. Text must not contain U+0000 characters. Text must not
- contain permanently undefined Unicode characters (noncharacters).
- Text must not contain control characters other than <a href=#space-character title="space character">space characters</a>. Extra constraints
- are placed on what is and what is not allowed in text based on where
- the text is to be put, as described in the other sections.</p>
+ attribute values, and comments. Text must consist of <a href=#unicode-character title="Unicode character">Unicode characters</a>. Text must not
+ contain U+0000 characters. Text must not contain permanently
+ undefined Unicode characters (noncharacters). Text must not contain
+ control characters other than <a href=#space-character title="space character">space
+ characters</a>. Extra constraints are placed on what is and what
+ is not allowed in text based on where the text is to be put, as
+ described in the other sections.</p>
<h5 id=newlines><span class=secno>11.1.3.1 </span>Newlines</h5>
@@ -77766,7 +77755,7 @@
<h4 id=overview-of-the-parsing-model><span class=secno>11.2.1 </span>Overview of the parsing model</h4>
<p>The input to the HTML parsing process consists of a stream of
- Unicode characters, which is passed through a
+ Unicode code points, which is passed through a
<a href=#tokenization>tokenization</a> stage followed by a <a href=#tree-construction>tree
construction</a> stage. The output is a <code><a href=#document>Document</a></code>
object.</p>
@@ -77815,7 +77804,7 @@
<h4 id=the-input-stream><span class=secno>11.2.2 </span>The <dfn>input stream</dfn></h4>
- <p>The stream of Unicode characters that comprises the input to the
+ <p>The stream of Unicode code points that comprises the input to the
tokenization stage will be initially seen by the user agent as a
stream of bytes (typically coming over the network or from the local
file system). The bytes encode the actual characters according to a
@@ -77853,8 +77842,8 @@
that encoding is <i>tentative</i> or <i>certain</i>, is <a href=#meta-charset-during-parse>used during the parsing</a> to
determine whether to <a href=#change-the-encoding>change the encoding</a>. If no
encoding is necessary, e.g. because the parser is operating on a
- stream of Unicode characters and doesn't have to use an encoding at
- all, then the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is
+ Unicode stream and doesn't have to use an encoding at all, then the
+ <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> is
<i>irrelevant</i>.</p>
<ol><li><p>If the user has explicitly instructed the user agent to
@@ -78476,7 +78465,7 @@
<h5 id=preprocessing-the-input-stream><span class=secno>11.2.2.3 </span>Preprocessing the input stream</h5>
<p>Given an encoding, the bytes in the input stream must be
- converted to Unicode characters for the tokenizer, as described by
+ converted to Unicode code points for the tokenizer, as described by
the rules for that encoding, except that the leading U+FEFF BYTE
ORDER MARK character, if any, must not be stripped by the encoding
layer (it is stripped by the rule below).</p> <!-- this is to
Modified: source
===================================================================
--- source 2011-06-03 01:21:42 UTC (rev 6183)
+++ source 2011-06-03 19:40:10 UTC (rev 6184)
@@ -1908,9 +1908,9 @@
different <meta charset> elements applying in each case.
-->
- <p>The term <dfn title="">Unicode character</dfn> is used to mean a
- <i title="">Unicode scalar value</i> (i.e. any Unicode code point
- that is not a surrogate code point). <a
+ <p>The term <dfn>Unicode character</dfn> is used to mean a <i
+ title="">Unicode scalar value</i> (i.e. any Unicode code point that
+ is not a surrogate code point). <a
href="#refsUNICODE">[UNICODE]</a></p>
@@ -2448,14 +2448,6 @@
is passed an Infinity or Not-a-Number (NaN) value, a
<code>NOT_SUPPORTED_ERR</code> exception must be raised.</p>
- <p>Except where otherwise specified, if a method has an argument
- of type <code>DOMString</code>, or if an IDL attribute is assigned
- a new value of type <code>DOMString</code>, the user agent must
- <span title="dfn-obtain-unicode">convert the
- <code>DOMString</code> to a sequence of Unicode characters</span>
- to obtain the string on which the algorithms in this specification
- are to operate. <a href="#refsWEBIDL">[WEBIDL]</a></p>
-
</dd>
<dt>JavaScript</dt>
@@ -6100,7 +6092,9 @@
characters as defined by UTF-8.</p>
<p>If any percent-encoded octets in that component are not valid
- UTF-8 sequences, then return an error and abort these steps.</p>
+ UTF-8 sequences (e.g. sequences of percent-encoded octets that
+ expand to surrogate code points), then return an error and abort
+ these steps.</p>
<p>Apply the IDNA ToASCII algorithm to the matching substring,
with both the AllowUnassigned and UseSTD3ASCIIRules flags
@@ -17326,11 +17320,11 @@
<dd>
- <p>The contents of that file, interpreted as string of
- Unicode characters, are the script source.</p>
+ <p>The contents of that file, interpreted as a Unicode
+ string, are the script source.</p>
- <p>To obtain the string of Unicode characters, the user
- agent run the following steps:</p>
+ <p>To obtain the Unicode string, the user agent run the
+ following steps:</p>
<ol>
@@ -17747,11 +17741,11 @@
star = %x002A ; U+002A ASTERISK (*)
slash = %x002F ; U+002F SOLIDUS (/)
not-newline = %x0000-0009 / %x000B-10FFFF
- ; a Unicode character other than U+000A LINE FEED (LF)
+ ; a <span>Unicode character</span> other than U+000A LINE FEED (LF)
not-star = %x0000-0029 / %x002B-10FFFF
- ; a Unicode character other than U+002A ASTERISK (*)
+ ; a <span>Unicode character</span> other than U+002A ASTERISK (*)
not-slash = %x0000-002E / %x0030-10FFFF
- ; a Unicode character other than U+002F SOLIDUS (/)</pre>
+ ; a <span>Unicode character</span> other than U+002F SOLIDUS (/)</pre>
<p class="note">This corresponds to putting the contents of the
element in JavaScript comments.</p>
@@ -35527,20 +35521,16 @@
parsing the provided byte stream. If the stream lacks this WebVTT
file signature, then the parser aborts.</p>
- <p>When converting the bytes into Unicode characters, if the
- encoding used is UTF-8, the bytes must be <span title="decoded as
- UTF-8, with error handling">decoded with the error handling</span>
- defined in this specification, and all U+0000 NULL characters must
- be replaced by U+FFFD REPLACEMENT CHARACTERs.</p>
-
<p>The <dfn>WebVTT parser algorithm</dfn> is as follows:</p>
<ol>
<li><p>Let <var title="">input</var> be the string being parsed,
- after conversion to Unicode and after the replacement of U+0000
- NULL characters described above.</p></li>
+ after conversion to Unicode.</p></li>
+ <li><p>Replace all U+0000 NULL characters in <var
+ title="">input</var> by U+FFFD REPLACEMENT CHARACTERs.</p></li>
+
<li><p>Let <var title="">position</var> be a pointer into <var
title="">input</var>, initially pointing at the start of the
string. In an <span>incremental WebVTT parser</span>, when this
@@ -72991,14 +72981,14 @@
expanding any sequences of percent-encoded octets in <var
title="">fragid</var> that are valid UTF-8 sequences into Unicode
characters as defined by UTF-8. If any percent-encoded octets in
- that string are not valid UTF-8 sequences, then skip this step and
- the next one.</p>
+ that string are not valid UTF-8 sequences (e.g. they expand to
+ surrogate code points), then skip this step and the next one.</p>
<li><p>If this step was not skipped and there is an element in the
- DOM that has an <span title="concept-id">ID</span> exactly equal to <var title="">decoded
- fragid</var>, then the first such element in tree order is
- <span>the indicated part of the document</span>; stop the algorithm
- here.</p></li>
+ DOM that has an <span title="concept-id">ID</span> exactly equal to
+ <var title="">decoded fragid</var>, then the first such element in
+ tree order is <span>the indicated part of the document</span>; stop
+ the algorithm here.</p></li>
<li><p>If there is an <code>a</code> element in the DOM that has a
<code title="attr-a-name">name</code> attribute whose value is
@@ -89195,9 +89185,9 @@
colon = %x003A ; U+003A COLON (:)
bom = %xFEFF ; U+FEFF BYTE ORDER MARK
name-char = %x0000-0009 / %x000B-000C / %x000E-0039 / %x003B-10FFFF
- ; a Unicode character other than U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR), or U+003A COLON (:)
+ ; a <span>Unicode character</span> other than U+000A LINE FEED (LF), U+000D CARRIAGE RETURN (CR), or U+003A COLON (:)
any-char = %x0000-0009 / %x000B-000C / %x000E-10FFFF
- ; a Unicode character other than U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)</pre>
+ ; a <span>Unicode character</span> other than U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)</pre>
<p>Event streams in this format must always be encoded as
UTF-8. <a href="#refsRFC3629">[RFC3629]</a></p>
@@ -92952,13 +92942,14 @@
<h4>Text</h4>
<p><dfn title="syntax-text">Text</dfn> is allowed inside elements,
- attribute values, and comments. Text must consist of Unicode
- characters. Text must not contain U+0000 characters. Text must not
- contain permanently undefined Unicode characters (noncharacters).
- Text must not contain control characters other than <span
- title="space character">space characters</span>. Extra constraints
- are placed on what is and what is not allowed in text based on where
- the text is to be put, as described in the other sections.</p>
+ attribute values, and comments. Text must consist of <span
+ title="Unicode character">Unicode characters</span>. Text must not
+ contain U+0000 characters. Text must not contain permanently
+ undefined Unicode characters (noncharacters). Text must not contain
+ control characters other than <span title="space character">space
+ characters</span>. Extra constraints are placed on what is and what
+ is not allowed in text based on where the text is to be put, as
+ described in the other sections.</p>
<h5>Newlines</h5>
@@ -93165,7 +93156,7 @@
<h4>Overview of the parsing model</h4>
<p>The input to the HTML parsing process consists of a stream of
- Unicode characters, which is passed through a
+ Unicode code points, which is passed through a
<span>tokenization</span> stage followed by a <span>tree
construction</span> stage. The output is a <code>Document</code>
object.</p>
@@ -93215,7 +93206,7 @@
<h4>The <dfn>input stream</dfn></h4>
- <p>The stream of Unicode characters that comprises the input to the
+ <p>The stream of Unicode code points that comprises the input to the
tokenization stage will be initially seen by the user agent as a
stream of bytes (typically coming over the network or from the local
file system). The bytes encode the actual characters according to a
@@ -93256,9 +93247,8 @@
href="#meta-charset-during-parse">used during the parsing</a> to
determine whether to <span>change the encoding</span>. If no
encoding is necessary, e.g. because the parser is operating on a
- stream of Unicode characters and doesn't have to use an encoding at
- all, then the <span
- title="concept-encoding-confidence">confidence</span> is
+ Unicode stream and doesn't have to use an encoding at all, then the
+ <span title="concept-encoding-confidence">confidence</span> is
<i>irrelevant</i>.</p>
<ol>
@@ -94029,7 +94019,7 @@
<h5>Preprocessing the input stream</h5>
<p>Given an encoding, the bytes in the input stream must be
- converted to Unicode characters for the tokenizer, as described by
+ converted to Unicode code points for the tokenizer, as described by
the rules for that encoding, except that the leading U+FEFF BYTE
ORDER MARK character, if any, must not be stripped by the encoding
layer (it is stripped by the rule below).</p> <!-- this is to
More information about the Commit-Watchers
mailing list