[html5] r8182 - [c] (2) Disallow surrogates in the input stream; make the syntax section match t [...]

whatwg at whatwg.org whatwg at whatwg.org
Fri Sep 13 14:27:14 PDT 2013


Author: ianh
Date: 2013-09-13 14:27:11 -0700 (Fri, 13 Sep 2013)
New Revision: 8182

Modified:
   complete.html
   index
   source
Log:
[c] (2) Disallow surrogates in the input stream; make the syntax section match the parser for character references to surrogates; add a redundant paragraph regarding namespaces
Affected topics: HTML Syntax and Parsing

Modified: complete.html
===================================================================
--- complete.html	2013-09-12 20:36:49 UTC (rev 8181)
+++ complete.html	2013-09-13 21:27:11 UTC (rev 8182)
@@ -256,7 +256,7 @@
 
   <header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
    <hgroup><h1 class=allcaps>HTML</h1>
-    <h2 class="no-num no-toc">Living Standard — Last Updated 12 September 2013</h2>
+    <h2 class="no-num no-toc">Living Standard — Last Updated 13 September 2013</h2>
    </hgroup><dl><dt><strong>Web developer edition:</strong></dt>
     <dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
     <dt>Multiple-page version:</dt>
@@ -85313,7 +85313,7 @@
    character (;).</dd>
 
   </dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
-  point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
+  point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), surrogates (U+D800–U+DFFF), and
   <a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>
 
   <p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
@@ -85416,6 +85416,13 @@
   <p>For the purposes of conformance checkers, if a resource is determined to be in <a href=#syntax>the HTML
   syntax</a>, then it is an <a href=#html-documents title="HTML documents">HTML document</a>.</p>
 
+  <p class=note>As stated <a href=#html-elements class=no-backref title="HTML elements">in the terminology
+  section</a>, references to <a href=#element-type title="element type">element types</a> that do not
+  explicitly specify a namespace always refer to elements in the <a href=#html-namespace-0>HTML namespace</a>. For
+  example, if the spec talks about "a <code><a href=#the-menuitem-element>menuitem</a></code> element", then that is an element with
+  the local name "<code title="">menuitem</code>", the namespace "<code title="">http://www.w3.org/1999/xhtml</code>", and the interface <code><a href=#htmlmenuitemelement>HTMLMenuItemElement</a></code>.
+  Where possible, references to such elements are hyperlinked to their definition.</p>
+
   </div>
 
 
@@ -86410,6 +86417,10 @@
   errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
   undefined Unicode characters (noncharacters).</p>
 
+  <p>Any <a href=#character>character</a> that is a not a <a href=#unicode-character>Unicode character</a>, i.e. any isolated
+  surrogates, is a <a href=#parse-error>parse error</a>. (These can only find their way into the input stream
+  via script APIs such as <code title=dom-document-write><a href=#dom-document-write>document.write()</a></code>.)</p>
+
   <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
   characters are treated specially. All CR characters must be
   converted to LF characters, and any LF characters that immediately

Modified: index
===================================================================
--- index	2013-09-12 20:36:49 UTC (rev 8181)
+++ index	2013-09-13 21:27:11 UTC (rev 8182)
@@ -256,7 +256,7 @@
 
   <header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
    <hgroup><h1 class=allcaps>HTML</h1>
-    <h2 class="no-num no-toc">Living Standard — Last Updated 12 September 2013</h2>
+    <h2 class="no-num no-toc">Living Standard — Last Updated 13 September 2013</h2>
    </hgroup><dl><dt><strong>Web developer edition:</strong></dt>
     <dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
     <dt>Multiple-page version:</dt>
@@ -85313,7 +85313,7 @@
    character (;).</dd>
 
   </dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
-  point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
+  point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), surrogates (U+D800–U+DFFF), and
   <a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>
 
   <p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
@@ -85416,6 +85416,13 @@
   <p>For the purposes of conformance checkers, if a resource is determined to be in <a href=#syntax>the HTML
   syntax</a>, then it is an <a href=#html-documents title="HTML documents">HTML document</a>.</p>
 
+  <p class=note>As stated <a href=#html-elements class=no-backref title="HTML elements">in the terminology
+  section</a>, references to <a href=#element-type title="element type">element types</a> that do not
+  explicitly specify a namespace always refer to elements in the <a href=#html-namespace-0>HTML namespace</a>. For
+  example, if the spec talks about "a <code><a href=#the-menuitem-element>menuitem</a></code> element", then that is an element with
+  the local name "<code title="">menuitem</code>", the namespace "<code title="">http://www.w3.org/1999/xhtml</code>", and the interface <code><a href=#htmlmenuitemelement>HTMLMenuItemElement</a></code>.
+  Where possible, references to such elements are hyperlinked to their definition.</p>
+
   </div>
 
 
@@ -86410,6 +86417,10 @@
   errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
   undefined Unicode characters (noncharacters).</p>
 
+  <p>Any <a href=#character>character</a> that is a not a <a href=#unicode-character>Unicode character</a>, i.e. any isolated
+  surrogates, is a <a href=#parse-error>parse error</a>. (These can only find their way into the input stream
+  via script APIs such as <code title=dom-document-write><a href=#dom-document-write>document.write()</a></code>.)</p>
+
   <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
   characters are treated specially. All CR characters must be
   converted to LF characters, and any LF characters that immediately

Modified: source
===================================================================
--- source	2013-09-12 20:36:49 UTC (rev 8181)
+++ source	2013-09-13 21:27:11 UTC (rev 8182)
@@ -95187,7 +95187,7 @@
   </dl>
 
   <p>The numeric character reference forms described above are allowed to reference any Unicode code
-  point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
+  point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), surrogates (U+D800–U+DFFF), and
   <span>control characters</span> other than <span title="space character">space characters</span>.</p>
 
   <p>An <dfn title="syntax-ambiguous-ampersand">ambiguous ampersand</dfn> is a U+0026 AMPERSAND
@@ -95296,6 +95296,14 @@
   <p>For the purposes of conformance checkers, if a resource is determined to be in <span>the HTML
   syntax</span>, then it is an <span title="HTML documents">HTML document</span>.</p>
 
+  <p class="note">As stated <span class="no-backref" title="HTML elements">in the terminology
+  section</span>, references to <span title="element type">element types</span> that do not
+  explicitly specify a namespace always refer to elements in the <span>HTML namespace</span>. For
+  example, if the spec talks about "a <code>menuitem</code> element", then that is an element with
+  the local name "<code title="">menuitem</code>", the namespace "<code
+  title="">http://www.w3.org/1999/xhtml</code>", and the interface <code>HTMLMenuItemElement</code>.
+  Where possible, references to such elements are hyperlinked to their definition.</p>
+
   </div>
 
 
@@ -96436,6 +96444,10 @@
   errors</span>. These are all <span>control characters</span> or permanently
   undefined Unicode characters (noncharacters).</p>
 
+  <p>Any <span>character</span> that is a not a <span>Unicode character</span>, i.e. any isolated
+  surrogates, is a <span>parse error</span>. (These can only find their way into the input stream
+  via script APIs such as <code title="dom-document-write">document.write()</code>.)</p>
+
   <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
   characters are treated specially. All CR characters must be
   converted to LF characters, and any LF characters that immediately




More information about the Commit-Watchers mailing list