[html5] r6649 - [e] (0) Define 'code unit'. Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id [...]

Thu Oct 6 16:30:15 PDT 2011

Author: ianh
Date: 2011-10-06 16:30:14 -0700 (Thu, 06 Oct 2011)
New Revision: 6649

Modified:
   complete.html
   index
   source
Log:
[e] (0) Define 'code unit'.
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=13676

Modified: complete.html
===================================================================

--- complete.html	2011-10-06 23:24:38 UTC (rev 6648)
+++ complete.html	2011-10-06 23:30:14 UTC (rev 6649)
@@ -3362,6 +3362,11 @@
   UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
   a BOM, raw UTF-16LE, and raw UTF-16BE. <a href=#refsRFC2781>[RFC2781]</a></p>
 
+  <p>The term <dfn id=code-unit>code unit</dfn> is used as defined in the Web IDL
+  specification: a 16 bit unsigned integer, the smallest atomic
+  component of a <code>DOMString</code>. (This is a narrower
+  definition than the one used in Unicode.) <a href=#refsWEBIDL>[WEBIDL]</a></p>
+
   <p>The term <dfn id=unicode-character>Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
   is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>
 
@@ -3369,17 +3374,17 @@
   <em>Unicode</em> character, means a <a href=#unicode-character>Unicode character</a>
   where possible, or a surrogate code point when not: when an
   algorithm that processes strings is defined in terms of characters,
-  a pair of <span title="code unit">code units</span> consisting of a
+  a pair of <a href=#code-unit title="code unit">code units</a> consisting of a
   high surrogate followed by a low surrogate must be treated as a
   single character, but isolated surrogates must each be treated as a
   single character also.</p>
 
   <p>The <dfn id=code-point-length>code-point length</dfn> of a string is the number of
-  <span title="code unit">code units</span> in that string. <a href=#refsWEBIDL>[WEBIDL]</a></p>
+  <a href=#code-unit title="code unit">code units</a> in that string.</p>
 
   <p class=note>This complexity results from the historical decision
-  to define the DOM API in terms of 16 bit (UTF-16) <span title="code
-  unit">code units</span>, rather than in terms of <a href=#unicode-character title="Unicode character">Unicode characters</a>.</p>
+  to define the DOM API in terms of 16 bit (UTF-16) <a href=#code-unit title="code
+  unit">code units</a>, rather than in terms of <a href=#unicode-character title="Unicode character">Unicode characters</a>.</p>
 
 
 

Modified: index
===================================================================
--- index	2011-10-06 23:24:38 UTC (rev 6648)
+++ index	2011-10-06 23:30:14 UTC (rev 6649)
@@ -3362,6 +3362,11 @@
   UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
   a BOM, raw UTF-16LE, and raw UTF-16BE. <a href=#refsRFC2781>[RFC2781]</a></p>
 
+  <p>The term <dfn id=code-unit>code unit</dfn> is used as defined in the Web IDL
+  specification: a 16 bit unsigned integer, the smallest atomic
+  component of a <code>DOMString</code>. (This is a narrower
+  definition than the one used in Unicode.) <a href=#refsWEBIDL>[WEBIDL]</a></p>
+
   <p>The term <dfn id=unicode-character>Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
   is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>
 
@@ -3369,17 +3374,17 @@
   <em>Unicode</em> character, means a <a href=#unicode-character>Unicode character</a>
   where possible, or a surrogate code point when not: when an
   algorithm that processes strings is defined in terms of characters,
-  a pair of <span title="code unit">code units</span> consisting of a
+  a pair of <a href=#code-unit title="code unit">code units</a> consisting of a
   high surrogate followed by a low surrogate must be treated as a
   single character, but isolated surrogates must each be treated as a
   single character also.</p>
 
   <p>The <dfn id=code-point-length>code-point length</dfn> of a string is the number of
-  <span title="code unit">code units</span> in that string. <a href=#refsWEBIDL>[WEBIDL]</a></p>
+  <a href=#code-unit title="code unit">code units</a> in that string.</p>
 
   <p class=note>This complexity results from the historical decision
-  to define the DOM API in terms of 16 bit (UTF-16) <span title="code
-  unit">code units</span>, rather than in terms of <a href=#unicode-character title="Unicode character">Unicode characters</a>.</p>
+  to define the DOM API in terms of 16 bit (UTF-16) <a href=#code-unit title="code
+  unit">code units</a>, rather than in terms of <a href=#unicode-character title="Unicode character">Unicode characters</a>.</p>
 
 
 

Modified: source
===================================================================
--- source	2011-10-06 23:24:38 UTC (rev 6648)
+++ source	2011-10-06 23:30:14 UTC (rev 6649)
@@ -2237,6 +2237,12 @@
   a BOM, raw UTF-16LE, and raw UTF-16BE. <a
   href="#refsRFC2781">[RFC2781]</a></p>
 
+  <p>The term <dfn>code unit</dfn> is used as defined in the Web IDL
+  specification: a 16 bit unsigned integer, the smallest atomic
+  component of a <code>DOMString</code>. (This is a narrower
+  definition than the one used in Unicode.) <a
+  href="#refsWEBIDL">[WEBIDL]</a></p>
+
   <p>The term <dfn>Unicode character</dfn> is used to mean a <i
   title="">Unicode scalar value</i> (i.e. any Unicode code point that
   is not a surrogate code point). <a
@@ -2252,8 +2258,7 @@
   single character also.</p>
 
   <p>The <dfn>code-point length</dfn> of a string is the number of
-  <span title="code unit">code units</span> in that string. <a
-  href="#refsWEBIDL">[WEBIDL]</a></p>
+  <span title="code unit">code units</span> in that string.</p>
 
   <p class="note">This complexity results from the historical decision
   to define the DOM API in terms of 16 bit (UTF-16) <span title="code