[html5] r944 - /
whatwg at whatwg.org
whatwg at whatwg.org
Thu Jun 21 19:05:35 PDT 2007
Author: ianh
Date: 2007-06-21 19:03:13 -0700 (Thu, 21 Jun 2007)
New Revision: 944
Modified:
index
source
Log:
[t] (2) Strip whitespace outside the root element from the DOM
Modified: index
===================================================================
--- index 2007-06-22 01:44:33 UTC (rev 943)
+++ index 2007-06-22 02:03:13 UTC (rev 944)
@@ -32123,6 +32123,15 @@
title=attr-meta-charset>character encoding declarations</a> are to be
serialised, as discussed in the section on that topic.
+ <p class=note>Space characters before the root <code><a
+ href="#html">html</a></code> element will be dropped when the document is
+ parsed; space characters <em>after</em> the root <code><a
+ href="#html">html</a></code> element will be parsed as if they were at the
+ end of the <code><a href="#html">html</a></code> element. Thus, space
+ characters around the root element do not round-trip. It is suggested that
+ newlines be inserted after the DOCTYPE and any comments that aren't in the
+ root element.
+
<h4 id=the-doctype><span class=secno>8.1.1. </span>The DOCTYPE</h4>
<p>A <dfn id=doctype title=syntax-doctype>DOCTYPE</dfn> is a mostly
@@ -35114,13 +35123,12 @@
from the <a href="#tokenisation0">tokenisation</a> stage as follows:
<dl class=switch>
- <dt>A character token that <em>is</em> one of one of U+0009 CHARACTER
- TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM
- FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE
+ <dt>A character token that is one of one of U+0009 CHARACTER TABULATION,
+ U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF),
+ <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE
<dd>
- <p><a href="#append" title="append a character">Append that character</a>
- to the <code>Document</code> node.</p>
+ <p>Ignore the token.</p>
<dt>A comment token
@@ -35451,8 +35459,7 @@
<!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE
<dd>
- <p><a href="#append" title="append a character">Append that character</a>
- to the <code>Document</code> node.</p>
+ <p>Ignore the token.</p>
<dt>A character token that is <em>not</em> one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM
@@ -38314,6 +38321,11 @@
<dd>
<p>Process the token as it would be processed in <a href="#the-main0">the
main phase</a>.</p>
+ <!-- if there was a <body>, the space will go
+ into it, otherwise (e.g. if there was a <frameset>) it'll go into
+ the <html> node (this is important in case we have "foo</html>
+ bar", as we don't want that to become one word) -->
+
<dt>A character token that is <em>not</em> one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM
Modified: source
===================================================================
--- source 2007-06-22 01:44:33 UTC (rev 943)
+++ source 2007-06-22 02:03:13 UTC (rev 944)
@@ -29618,7 +29618,15 @@
title="attr-meta-charset">character encoding declarations</span> are
to be serialised, as discussed in the section on that topic.</p>
+ <p class="note">Space characters before the root <code>html</code>
+ element will be dropped when the document is parsed; space
+ characters <em>after</em> the root <code>html</code> element will be
+ parsed as if they were at the end of the <code>html</code>
+ element. Thus, space characters around the root element do not
+ round-trip. It is suggested that newlines be inserted after the
+ DOCTYPE and any comments that aren't in the root element.</p>
+
<h4>The DOCTYPE</h4>
<p>A <dfn title="syntax-doctype">DOCTYPE</dfn> is a mostly useless,
@@ -32438,13 +32446,12 @@
<dl class="switch">
- <dt>A character token that <em>is</em> one of one of U+0009
- CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
- TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or
- U+0020 SPACE</dt>
+ <dt>A character token that is one of one of U+0009 CHARACTER
+ TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
+ FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
+ SPACE</dt>
<dd>
- <p><span title="append a character">Append that character</span>
- to the <code>Document</code> node.</p>
+ <p>Ignore the token.</p>
</dd>
<dt>A comment token</dt>
@@ -32625,10 +32632,10 @@
<dt>A character token that is one of one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
- FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+ FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
+ SPACE</dt>
<dd>
- <p><span title="append a character">Append that character</span>
- to the <code>Document</code> node.</p>
+ <p>Ignore the token.</p>
</dd>
<dt>A character token that is <em>not</em> one of U+0009 CHARACTER
@@ -35622,10 +35629,14 @@
<dt>A character token that is one of one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
- FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+ FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
+ SPACE</dt>
<dd>
<p>Process the token as it would be processed in <span>the main
- phase</span>.</p>
+ phase</span>.</p> <!-- if there was a <body>, the space will go
+ into it, otherwise (e.g. if there was a <frameset>) it'll go into
+ the <html> node (this is important in case we have "foo</html>
+ bar", as we don't want that to become one word) -->
</dd>
<dt>A character token that is <em>not</em> one of U+0009 CHARACTER
More information about the Commit-Watchers
mailing list