[html5] r944 - /

Thu Jun 21 19:05:35 PDT 2007

Author: ianh
Date: 2007-06-21 19:03:13 -0700 (Thu, 21 Jun 2007)
New Revision: 944

Modified:
   index
   source
Log:
[t] (2) Strip whitespace outside the root element from the DOM

Modified: index
===================================================================

--- index	2007-06-22 01:44:33 UTC (rev 943)
+++ index	2007-06-22 02:03:13 UTC (rev 944)
@@ -32123,6 +32123,15 @@
    title=attr-meta-charset>character encoding declarations</a> are to be
    serialised, as discussed in the section on that topic.
 
+  <p class=note>Space characters before the root <code><a
+   href="#html">html</a></code> element will be dropped when the document is
+   parsed; space characters <em>after</em> the root <code><a
+   href="#html">html</a></code> element will be parsed as if they were at the
+   end of the <code><a href="#html">html</a></code> element. Thus, space
+   characters around the root element do not round-trip. It is suggested that
+   newlines be inserted after the DOCTYPE and any comments that aren't in the
+   root element.
+
   <h4 id=the-doctype><span class=secno>8.1.1. </span>The DOCTYPE</h4>
 
   <p>A <dfn id=doctype title=syntax-doctype>DOCTYPE</dfn> is a mostly
@@ -35114,13 +35123,12 @@
    from the <a href="#tokenisation0">tokenisation</a> stage as follows:
 
   <dl class=switch>
-   <dt>A character token that <em>is</em> one of one of U+0009 CHARACTER
-    TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM
-    FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE
+   <dt>A character token that is one of one of U+0009 CHARACTER TABULATION,
+    U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF),
+    <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE
 
    <dd>
-    <p><a href="#append" title="append a character">Append that character</a>
-     to the <code>Document</code> node.</p>
+    <p>Ignore the token.</p>
 
    <dt>A comment token
 
@@ -35451,8 +35459,7 @@
     <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE
 
    <dd>
-    <p><a href="#append" title="append a character">Append that character</a>
-     to the <code>Document</code> node.</p>
+    <p>Ignore the token.</p>
 
    <dt>A character token that is <em>not</em> one of U+0009 CHARACTER
     TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM
@@ -38314,6 +38321,11 @@
    <dd>
     <p>Process the token as it would be processed in <a href="#the-main0">the
      main phase</a>.</p>
+    <!-- if there was a <body>, the space will go
+    into it, otherwise (e.g. if there was a <frameset>) it'll go into
+    the <html> node (this is important in case we have "foo</html>
+    bar", as we don't want that to become one word) -->
+    
 
    <dt>A character token that is <em>not</em> one of U+0009 CHARACTER
     TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM

Modified: source
===================================================================
--- source	2007-06-22 01:44:33 UTC (rev 943)
+++ source	2007-06-22 02:03:13 UTC (rev 944)
@@ -29618,7 +29618,15 @@
   title="attr-meta-charset">character encoding declarations</span> are
   to be serialised, as discussed in the section on that topic.</p>
 
+  <p class="note">Space characters before the root <code>html</code>
+  element will be dropped when the document is parsed; space
+  characters <em>after</em> the root <code>html</code> element will be
+  parsed as if they were at the end of the <code>html</code>
+  element. Thus, space characters around the root element do not
+  round-trip. It is suggested that newlines be inserted after the
+  DOCTYPE and any comments that aren't in the root element.</p>
 
+
   <h4>The DOCTYPE</h4>
 
   <p>A <dfn title="syntax-doctype">DOCTYPE</dfn> is a mostly useless,
@@ -32438,13 +32446,12 @@
 
   <dl class="switch">
 
-   <dt>A character token that <em>is</em> one of one of U+0009
-   CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
-   TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or
-   U+0020 SPACE</dt>
+   <dt>A character token that is one of one of U+0009 CHARACTER
+   TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
+   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
+   SPACE</dt>
    <dd>
-    <p><span title="append a character">Append that character</span>
-    to the <code>Document</code> node.</p>
+    <p>Ignore the token.</p>
    </dd>
 
    <dt>A comment token</dt>
@@ -32625,10 +32632,10 @@
 
    <dt>A character token that is one of one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
-   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
+   SPACE</dt>
    <dd>
-    <p><span title="append a character">Append that character</span>
-    to the <code>Document</code> node.</p>
+    <p>Ignore the token.</p>
    </dd>
 
    <dt>A character token that is <em>not</em> one of U+0009 CHARACTER
@@ -35622,10 +35629,14 @@
 
    <dt>A character token that is one of one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
-   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
+   SPACE</dt>
    <dd>
     <p>Process the token as it would be processed in <span>the main
-    phase</span>.</p>
+    phase</span>.</p> <!-- if there was a <body>, the space will go
+    into it, otherwise (e.g. if there was a <frameset>) it'll go into
+    the <html> node (this is important in case we have "foo</html>
+    bar", as we don't want that to become one word) -->
    </dd>
 
    <dt>A character token that is <em>not</em> one of U+0009 CHARACTER