[html5] r931 - /

whatwg at whatwg.org whatwg at whatwg.org
Tue Jun 19 16:49:50 PDT 2007


Author: ianh
Date: 2007-06-19 16:49:49 -0700 (Tue, 19 Jun 2007)
New Revision: 931

Modified:
   index
   source
Log:
[e] (0) Abstract out the innerHTML serialisation algorithm.

Modified: index
===================================================================
--- index	2007-06-19 23:27:35 UTC (rev 930)
+++ index	2007-06-19 23:49:49 UTC (rev 931)
@@ -1518,7 +1518,10 @@
 
      <li><a href="#namespaces"><span class=secno>8.3. </span>Namespaces</a>
 
-     <li><a href="#entities"><span class=secno>8.4. </span>Entities</a>
+     <li><a href="#serialising"><span class=secno>8.4. </span>Serialising
+      HTML fragments</a>
+
+     <li><a href="#entities"><span class=secno>8.5. </span>Entities</a>
     </ul>
 
    <li><a href="#wysiwyg"><span class=secno>9. </span>WYSIWYG editors</a>
@@ -3710,166 +3713,9 @@
 
   <p>On getting, the <code title=dom-innerHTML-HTML><a
    href="#innerhtml0">innerHTML</a></code> DOM attribute must return the
-   result of running the following algorithm:
+   result of running the <a href="#html-fragment">HTML fragment serialisation
+   algorithm</a> on the node.
 
-  <ol>
-   <li>
-    <p>Let <var title="">s</var> be a string, and initialise it to the empty
-     string.
-
-   <li>
-    <p>For each child node <var title="">child</var>, in <a
-     href="#tree-order">tree order</a>, append the appropriate string from
-     the following list to <var title="">s</var>:</p>
-
-    <dl class=switch>
-     <dt>If the child node is an <code title="">Element</code>
-
-     <dd>
-      <p>Append a U+003C LESS-THAN SIGN (<code title=""><</code>)
-       character, followed by the element's tag name. (For nodes created by
-       the <a href="#html-0">HTML parser</a>, <code
-       title="">Document.createElement()</code>, or <code
-       title="">Document.renameNode()</code>, the tag name will be
-       lowercase.)</p>
-
-      <p>For each attribute that the element has, append a U+0020 SPACE
-       character, the attribute's name (which, for attributes set by the <a
-       href="#html-0">HTML parser</a> or by <code
-       title="">Element.setAttributeNode()</code> or <code
-       title="">Element.setAttribute()</code>, will be lowercase), a U+003D
-       EQUALS SIGN (<code title="">=</code>) character, a U+0022 QUOTATION
-       MARK (<code title="">"</code>) character, the attribute's value,
-       <a href="#escapingString" title="escaping a string">escaped as
-       described below</a>, and a second U+0022 QUOTATION MARK (<code
-       title="">"</code>) character.</p>
-
-      <p>While the exact order of attributes is UA-defined, and may depend on
-       factors such as the order that the attributes were given in the
-       original markup, the sort order must be stable, such that consecutive
-       calls to <code title=dom-innerHTML-HTML><a
-       href="#innerhtml0">innerHTML</a></code> serialise an element's
-       attributes in the same order.</p>
-
-      <p>Append a U+003E GREATER-THAN SIGN (<code title="">></code>)
-       character.</p>
-
-      <p>If the child node is an <code><a href="#area">area</a></code>,
-       <code><a href="#base">base</a></code>, <code>basefont</code>,
-       <code>bgsound</code>, <code><a href="#br">br</a></code>, <code><a
-       href="#col">col</a></code>, <code><a href="#embed">embed</a></code>,
-       <code>frame</code>, <code><a href="#hr">hr</a></code>, <code><a
-       href="#img">img</a></code>, <code>input</code>, <code><a
-       href="#link">link</a></code>, <code><a href="#meta0">meta</a></code>,
-       <code><a href="#param">param</a></code>, <code>spacer</code>, or
-       <code>wbr</code> element, then continue on to the next child node at
-       this point.</p>
-      <!-- also, i guess:
-      image, isindex, and keygen, but we don't list those because we
-      don't consider those "elements", more "macros", and thus we
-      should never serialise them -->
-      <!-- XXX when we get around to
-      it, add event-source -->
-      <p>If the child node is a <code><a href="#pre">pre</a></code> or
-       <code>textarea</code> element, append a U+000A LINE FEED (LF)
-       character.</p>
-
-      <p>Append the value of the <var title="">child</var> element's <code
-       title=dom-innerHTML-HTML><a href="#innerhtml0">innerHTML</a></code>
-       DOM attribute (thus recursing into this algorithm for that element),
-       followed by a U+003C LESS-THAN SIGN (<code title=""><</code>)
-       character, a U+002F SOLIDUS (<code title="">/</code>) character, the
-       element's tag name again, and finally a U+003E GREATER-THAN SIGN
-       (<code title="">></code>) character.</p>
-
-     <dt>If the child node is a <code title="">Text</code> or <code
-      title="">CDATASection</code> node
-
-     <dd>
-      <p>If one of the ancestors of the child node is a <code><a
-       href="#style">style</a></code>, <code><a
-       href="#script0">script</a></code>, <code>xmp</code>, <code><a
-       href="#iframe">iframe</a></code>, <code>noembed</code>,
-       <code>noframes</code>, or <code><a
-       href="#noscript">noscript</a></code> element, then append the value of
-       the <var title="">child</var> node's <code title="">data</code> DOM
-       attribute literally.</p>
-      <!-- note about noscript: because this is defining an API, it
-      can assume that scripting is enabled, and that thus the
-      <noscript> element in the DOM will have been parsed in the
-      scripting-enabled mode, and that thus the text node is raw
-      markup -->
-      
-      <p>Otherwise, append the value of the <var title="">child</var> node's
-       <code title="">data</code> DOM attribute, <a href="#escapingString"
-       title="escaping a string">escaped as described below</a>.</p>
-
-     <dt>If the child node is a <code title="">Comment</code>
-
-     <dd>
-      <p>Append the literal string <code><!--</code> (U+003C LESS-THAN
-       SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D
-       HYPHEN-MINUS), followed by the value of the <var title="">child</var>
-       node's <code title="">data</code> DOM attribute, followed by the
-       literal string <code>--></code> (U+002D HYPHEN-MINUS, U+002D
-       HYPHEN-MINUS, U+003E GREATER-THAN SIGN).</p>
-
-     <dt>If the child node is a <code title="">DocumentType</code>
-
-     <dd>
-      <p>Append the literal string <code><!DOCTYPE</code> (U+003C
-       LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL LETTER
-       D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL LETTER C,
-       U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+0050
-       LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL LETTER E), followed by a
-       space (U+0020 SPACE), followed by the value of the <var
-       title="">child</var> node's <code title="">name</code> DOM attribute,
-       followed by the literal string <code>></code> (U+003E GREATER-THAN
-       SIGN).</p>
-    </dl>
-
-    <p>Other nodes types (e.g. <code title="">Attr</code>) cannot occur as
-     children of elements. If they do, the <code title=dom-innerHTML-HTML><a
-     href="#innerhtml0">innerHTML</a></code> attribute must raise an
-     <code>INVALID_STATE_ERR</code> exception.</p>
-
-   <li>
-    <p>The result of the algorithm is the string <var title="">s</var>.
-  </ol>
-
-  <p><dfn id=escapingString>Escaping a string</dfn> (for the purposes of the
-   algorithm above) consists of replacing any occurances of the "<code
-   title="">&</code>" character by the string "<code
-   title="">&amp;</code>", any occurances of the "<code
-   title=""><</code>" character by the string "<code
-   title="">&lt;</code>", any occurances of the "<code
-   title="">></code>" character by the string "<code
-   title="">&gt;</code>", and any occurances of the "<code
-   title="">"</code>" character by the string "<code
-   title="">&quot;</code>".
-
-  <p class=note>Entity reference nodes are <a
-   href="#entity-references">assumed to be expanded</a> by the user agent,
-   and are therefore not covered in the algorithm above.
-
-  <p class=note>It is possible that the roundtripping through <code
-   title=dom-innerHTML-HTML><a href="#innerhtml0">innerHTML</a></code> will
-   not work. For instance, if the element is a <code>textarea</code> element
-   to which a <code title="">Comment</code> node has been appended, then
-   assigning <code title=dom-innerHTML-HTML><a
-   href="#innerhtml0">innerHTML</a></code> to itself will result in the
-   comment being displayed in the text field. Similarly, if, as a result of
-   DOM manipulation, the element contains a comment that contains the literal
-   string "<code title="">--></code>", then when the result of serialising
-   the element is parsed, the comment will be truncated at that point and the
-   rest of the comment will be interpreted as markup. More examples would be
-   making a <code><a href="#script0">script</a></code> element contain a text
-   node with the text string "<code></script></code>", or having a
-   <code><a href="#p">p</a></code> element that contains a <code><a
-   href="#ul">ul</a></code> element (as the <code><a href="#ul">ul</a></code>
-   element's <span title=syntax-start-tag>start tag</span> would imply the
-   end tag for the <code><a href="#p">p</a></code>).
-
   <p>On setting, if the node is a document, the <code
    title=dom-innerHTML-HTML><a href="#innerhtml0">innerHTML</a></code> DOM
    attribute must run the following algorithm:
@@ -38631,7 +38477,170 @@
   <p>The <dfn id=html-namespace0>HTML namespace</dfn> is:
    <code>http://www.w3.org/1999/xhtml</code>
 
-  <h3 id=entities><span class=secno>8.4. </span><dfn
+  <h3 id=serialising><span class=secno>8.4. </span>Serialising HTML fragments</h3>
+
+  <p>The following steps form the <dfn id=html-fragment>HTML fragment
+   serialisation algorithm</dfn>. The algorithm takes as input a DOM
+   <code>Element</code> or <code>Document</code>, referred to as <var
+   title="">the node</var>, and either returns a string or raises an
+   exception.
+
+  <ol>
+   <li>
+    <p>Let <var title="">s</var> be a string, and initialise it to the empty
+     string.
+
+   <li>
+    <p>For each child node <var title="">child</var> of <var title="">the
+     node</var>, in <a href="#tree-order">tree order</a>, append the
+     appropriate string from the following list to <var title="">s</var>:</p>
+
+    <dl class=switch>
+     <dt>If the child node is an <code title="">Element</code>
+
+     <dd>
+      <p>Append a U+003C LESS-THAN SIGN (<code title=""><</code>)
+       character, followed by the element's tag name. (For nodes created by
+       the <a href="#html-0">HTML parser</a>, <code
+       title="">Document.createElement()</code>, or <code
+       title="">Document.renameNode()</code>, the tag name will be
+       lowercase.)</p>
+
+      <p>For each attribute that the element has, append a U+0020 SPACE
+       character, the attribute's name (which, for attributes set by the <a
+       href="#html-0">HTML parser</a> or by <code
+       title="">Element.setAttributeNode()</code> or <code
+       title="">Element.setAttribute()</code>, will be lowercase), a U+003D
+       EQUALS SIGN (<code title="">=</code>) character, a U+0022 QUOTATION
+       MARK (<code title="">"</code>) character, the attribute's value,
+       <a href="#escapingString" title="escaping a string">escaped as
+       described below</a>, and a second U+0022 QUOTATION MARK (<code
+       title="">"</code>) character.</p>
+
+      <p>While the exact order of attributes is UA-defined, and may depend on
+       factors such as the order that the attributes were given in the
+       original markup, the sort order must be stable, such that consecutive
+       invocations of this algorithm serialise an element's attributes in the
+       same order.</p>
+
+      <p>Append a U+003E GREATER-THAN SIGN (<code title="">></code>)
+       character.</p>
+
+      <p>If the child node is an <code><a href="#area">area</a></code>,
+       <code><a href="#base">base</a></code>, <code>basefont</code>,
+       <code>bgsound</code>, <code><a href="#br">br</a></code>, <code><a
+       href="#col">col</a></code>, <code><a href="#embed">embed</a></code>,
+       <code>frame</code>, <code><a href="#hr">hr</a></code>, <code><a
+       href="#img">img</a></code>, <code>input</code>, <code><a
+       href="#link">link</a></code>, <code><a href="#meta0">meta</a></code>,
+       <code><a href="#param">param</a></code>, <code>spacer</code>, or
+       <code>wbr</code> element, then continue on to the next child node at
+       this point.</p>
+      <!-- also, i guess:
+      image, isindex, and keygen, but we don't list those because we
+      don't consider those "elements", more "macros", and thus we
+      should never serialise them -->
+      <!-- XXX when we get around to
+      it, add event-source -->
+      <p>If the child node is a <code><a href="#pre">pre</a></code> or
+       <code>textarea</code> element, append a U+000A LINE FEED (LF)
+       character.</p>
+
+      <p>Append the value of running the <a href="#html-fragment">HTML
+       fragment serialisation algorithm</a> on the <var title="">child</var>
+       element (thus recursing into this algorithm for that element),
+       followed by a U+003C LESS-THAN SIGN (<code title=""><</code>)
+       character, a U+002F SOLIDUS (<code title="">/</code>) character, the
+       element's tag name again, and finally a U+003E GREATER-THAN SIGN
+       (<code title="">></code>) character.</p>
+
+     <dt>If the child node is a <code title="">Text</code> or <code
+      title="">CDATASection</code> node
+
+     <dd>
+      <p>If one of the ancestors of the child node is a <code><a
+       href="#style">style</a></code>, <code><a
+       href="#script0">script</a></code>, <code>xmp</code>, <code><a
+       href="#iframe">iframe</a></code>, <code>noembed</code>,
+       <code>noframes</code>, or <code><a
+       href="#noscript">noscript</a></code> element, then append the value of
+       the <var title="">child</var> node's <code title="">data</code> DOM
+       attribute literally.</p>
+      <!-- note about noscript: because this is defining an API, it
+      can assume that scripting is enabled, and that thus the
+      <noscript> element in the DOM will have been parsed in the
+      scripting-enabled mode, and that thus the text node is raw
+      markup -->
+      
+      <p>Otherwise, append the value of the <var title="">child</var> node's
+       <code title="">data</code> DOM attribute, <a href="#escapingString"
+       title="escaping a string">escaped as described below</a>.</p>
+
+     <dt>If the child node is a <code title="">Comment</code>
+
+     <dd>
+      <p>Append the literal string <code><!--</code> (U+003C LESS-THAN
+       SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D
+       HYPHEN-MINUS), followed by the value of the <var title="">child</var>
+       node's <code title="">data</code> DOM attribute, followed by the
+       literal string <code>--></code> (U+002D HYPHEN-MINUS, U+002D
+       HYPHEN-MINUS, U+003E GREATER-THAN SIGN).</p>
+
+     <dt>If the child node is a <code title="">DocumentType</code>
+
+     <dd>
+      <p>Append the literal string <code><!DOCTYPE</code> (U+003C
+       LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL LETTER
+       D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL LETTER C,
+       U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+0050
+       LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL LETTER E), followed by a
+       space (U+0020 SPACE), followed by the value of the <var
+       title="">child</var> node's <code title="">name</code> DOM attribute,
+       followed by the literal string <code>></code> (U+003E GREATER-THAN
+       SIGN).</p>
+    </dl>
+
+    <p>Other nodes types (e.g. <code title="">Attr</code>) cannot occur as
+     children of elements. If they do, this algorithm must raise an
+     <code>INVALID_STATE_ERR</code> exception.</p>
+
+   <li>
+    <p>The result of the algorithm is the string <var title="">s</var>.
+  </ol>
+
+  <p><dfn id=escapingString>Escaping a string</dfn> (for the purposes of the
+   algorithm above) consists of replacing any occurances of the "<code
+   title="">&</code>" character by the string "<code
+   title="">&amp;</code>", any occurances of the "<code
+   title=""><</code>" character by the string "<code
+   title="">&lt;</code>", any occurances of the "<code
+   title="">></code>" character by the string "<code
+   title="">&gt;</code>", and any occurances of the "<code
+   title="">"</code>" character by the string "<code
+   title="">&quot;</code>".
+
+  <p class=note>Entity reference nodes are <a
+   href="#entity-references">assumed to be expanded</a> by the user agent,
+   and are therefore not covered in the algorithm above.
+
+  <p class=note>It is possible that the output of this algorithm, if parsed
+   with an <a href="#html-0">HTML parser</a>, will not return the original
+   tree structure. For instance, if a <code>textarea</code> element to which
+   a <code title="">Comment</code> node has been appended is serialised and
+   the output is then reparsed, the comment will end up being displayed in
+   the text field. Similarly, if, as a result of DOM manipulation, an element
+   contains a comment that contains the literal string "<code
+   title="">--></code>", then when the result of serialising the element
+   is parsed, the comment will be truncated at that point and the rest of the
+   comment will be interpreted as markup. More examples would be making a
+   <code><a href="#script0">script</a></code> element contain a text node
+   with the text string "<code></script></code>", or having a <code><a
+   href="#p">p</a></code> element that contains a <code><a
+   href="#ul">ul</a></code> element (as the <code><a href="#ul">ul</a></code>
+   element's <span title=syntax-start-tag>start tag</span> would imply the
+   end tag for the <code><a href="#p">p</a></code>).
+
+  <h3 id=entities><span class=secno>8.5. </span><dfn
    id=entities0>Entities</dfn></h3>
 
   <p>This table lists the entity names that are supported by HTML, and the

Modified: source
===================================================================
--- source	2007-06-19 23:27:35 UTC (rev 930)
+++ source	2007-06-19 23:49:49 UTC (rev 931)
@@ -2280,183 +2280,9 @@
   follow.</p>
 
   <p>On getting, the <code title="dom-innerHTML-HTML">innerHTML</code>
-  DOM attribute must return the result of running the following
-  algorithm:</p>
+  DOM attribute must return the result of running the <span>HTML
+  fragment serialisation algorithm</span> on the node.</p>
 
-  <ol>
-
-   <li><p>Let <var title="">s</var> be a string, and initialise it to
-   the empty string.</p></li>
-
-   <li>
-
-    <p>For each child node <var title="">child</var>, in <span>tree order</span>,
-    append the appropriate string from the following list to <var
-    title="">s</var>:</p>
-
-    <dl class="switch">
-
-     <dt>If the child node is an <code title="">Element</code></dt>
-
-     <dd>
-
-      <p>Append a U+003C LESS-THAN SIGN (<code title=""><</code>)
-      character, followed by the element's tag name. (For nodes
-      created by the <span>HTML parser</span>, <code
-      title="">Document.createElement()</code>, or <code
-      title="">Document.renameNode()</code>, the tag name will be
-      lowercase.)</p>
-
-      <p>For each attribute that the element has, append a U+0020
-      SPACE character, the attribute's name (which, for attributes set
-      by the <span>HTML parser</span> or by <code
-      title="">Element.setAttributeNode()</code> or <code
-      title="">Element.setAttribute()</code>, will be lowercase), a
-      U+003D EQUALS SIGN (<code title="">=</code>) character, a U+0022
-      QUOTATION MARK (<code title="">"</code>) character, the
-      attribute's value, <span title="escaping a string">escaped as
-      described below</span>, and a second U+0022 QUOTATION MARK
-      (<code title="">"</code>) character.</p>
-
-      <p>While the exact order of attributes is UA-defined, and may
-      depend on factors such as the order that the attributes were
-      given in the original markup, the sort order must be stable,
-      such that consecutive calls to <code
-      title="dom-innerHTML-HTML">innerHTML</code> serialise an
-      element's attributes in the same order.</p>
-
-      <p>Append a U+003E GREATER-THAN SIGN (<code title="">></code>)
-      character.</p>
-
-      <p>If the child node is an <code>area</code>, <code>base</code>,
-      <code>basefont</code>, <code>bgsound</code>, <code>br</code>,
-      <code>col</code>, <code>embed</code>, <code>frame</code>,
-      <code>hr</code>, <code>img</code>, <code>input</code>,
-      <code>link</code>, <code>meta</code>, <code>param</code>,
-      <code>spacer</code>, or <code>wbr</code> element, then continue
-      on to the next child node at this point.</p> <!-- also, i guess:
-      image, isindex, and keygen, but we don't list those because we
-      don't consider those "elements", more "macros", and thus we
-      should never serialise them --> <!-- XXX when we get around to
-      it, add event-source -->
-
-      <p>If the child node is a <code>pre</code> or
-      <code>textarea</code> element, append a U+000A LINE FEED (LF)
-      character.</p>
-
-      <p>Append the value of the <var title="">child</var> element's
-      <code title="dom-innerHTML-HTML">innerHTML</code> DOM attribute
-      (thus recursing into this algorithm for that element), followed
-      by a U+003C LESS-THAN SIGN (<code title=""><</code>)
-      character, a U+002F SOLIDUS (<code title="">/</code>) character,
-      the element's tag name again, and finally a U+003E GREATER-THAN
-      SIGN (<code title="">></code>) character.</p>
-
-     </dd>
-
-
-     <dt>If the child node is a <code title="">Text</code> or <code
-     title="">CDATASection</code> node</dt>
-
-     <dd>
-
-      <p>If one of the ancestors of the child node is a
-      <code>style</code>, <code>script</code>, <code>xmp</code>,
-      <code>iframe</code>, <code>noembed</code>,
-      <code>noframes</code>, or <code>noscript</code> element, then
-      append the value of the <var title="">child</var> node's <code
-      title="">data</code> DOM attribute literally.</p>
-      <!-- note about noscript: because this is defining an API, it
-      can assume that scripting is enabled, and that thus the
-      <noscript> element in the DOM will have been parsed in the
-      scripting-enabled mode, and that thus the text node is raw
-      markup -->
-
-      <p>Otherwise, append the value of the <var title="">child</var>
-      node's <code title="">data</code> DOM attribute, <span
-      title="escaping a string">escaped as described below</span>.</p>
-
-     </dd>
-
-
-     <dt>If the child node is a <code title="">Comment</code></dt>
-
-     <dd>
-
-      <p>Append the literal string <code><!--</code> (U+003C
-      LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS,
-      U+002D HYPHEN-MINUS), followed by the value of the <var
-      title="">child</var> node's <code title="">data</code> DOM
-      attribute, followed by the literal string <code>--></code>
-      (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN
-      SIGN).</p>
-
-     </dd>
-
-
-     <dt>If the child node is a <code title="">DocumentType</code></dt>
-
-     <dd>
-
-      <p>Append the literal string <code><!DOCTYPE</code> (U+003C
-      LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL
-      LETTER D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL
-      LETTER C, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL
-      LETTER Y, U+0050 LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL
-      LETTER E), followed by a space (U+0020 SPACE), followed by the
-      value of the <var title="">child</var> node's <code
-      title="">name</code> DOM attribute, followed by the literal
-      string <code>></code> (U+003E GREATER-THAN SIGN).</p>
-
-     </dd>
-
-
-    </dl>
-
-    <p>Other nodes types (e.g. <code title="">Attr</code>) cannot
-    occur as children of elements. If they do, the <code
-    title="dom-innerHTML-HTML">innerHTML</code> attribute must raise
-    an <code>INVALID_STATE_ERR</code> exception.</p>
-
-   </li>
-
-   <li><p>The result of the algorithm is the string <var
-   title="">s</var>.</p></li>
-
-  </ol>
-
-  <p><dfn id="escapingString">Escaping a string</dfn> (for the
-  purposes of the algorithm above) consists of replacing any
-  occurances of the "<code title="">&</code>" character by the
-  string "<code title="">&amp;</code>", any occurances of the
-  "<code title=""><</code>" character by the string "<code
-  title="">&lt;</code>", any occurances of the "<code
-  title="">></code>" character by the string "<code
-  title="">&gt;</code>", and any occurances of the "<code
-  title="">"</code>" character by the string "<code
-  title="">&quot;</code>".</p>
-
-  <p class="note">Entity reference nodes are <a
-  href="#entity-references">assumed to be expanded</a> by the user
-  agent, and are therefore not covered in the algorithm above.</p>
-
-  <p class="note">It is possible that the roundtripping through <code
-  title="dom-innerHTML-HTML">innerHTML</code> will not work. For
-  instance, if the element is a <code>textarea</code> element to which
-  a <code title="">Comment</code> node has been appended, then
-  assigning <code title="dom-innerHTML-HTML">innerHTML</code> to
-  itself will result in the comment being displayed in the text
-  field. Similarly, if, as a result of DOM manipulation, the element
-  contains a comment that contains the literal string "<code
-  title="">--></code>", then when the result of serialising the
-  element is parsed, the comment will be truncated at that point and
-  the rest of the comment will be interpreted as markup. More examples
-  would be making a <code>script</code> element contain a text node
-  with the text string "<code></script></code>", or having a
-  <code>p</code> element that contains a <code>ul</code> element (as
-  the <code>ul</code> element's <span title="syntax-start-tag">start
-  tag</span> would imply the end tag for the <code>p</code>).</p>
-
   <p>On setting, if the node is a document, the <code
   title="dom-innerHTML-HTML">innerHTML</code> DOM attribute must run
   the following algorithm:</p>
@@ -36017,6 +35843,190 @@
 
   <p>The <dfn>HTML namespace</dfn> is: <code>http://www.w3.org/1999/xhtml</code></p>
 
+
+
+  <h3>Serialising HTML fragments</h3>
+
+  <p>The following steps form the <dfn>HTML fragment serialisation
+  algorithm</dfn>. The algorithm takes as input a DOM
+  <code>Element</code> or <code>Document</code>, referred to as <var
+  title="">the node</var>, and either returns a string or raises an
+  exception.</p>
+
+  <ol>
+
+   <li><p>Let <var title="">s</var> be a string, and initialise it to
+   the empty string.</p></li>
+
+   <li>
+
+    <p>For each child node <var title="">child</var> of <var
+    title="">the node</var>, in <span>tree order</span>, append the
+    appropriate string from the following list to <var
+    title="">s</var>:</p>
+
+    <dl class="switch">
+
+     <dt>If the child node is an <code title="">Element</code></dt>
+
+     <dd>
+
+      <p>Append a U+003C LESS-THAN SIGN (<code title=""><</code>)
+      character, followed by the element's tag name. (For nodes
+      created by the <span>HTML parser</span>, <code
+      title="">Document.createElement()</code>, or <code
+      title="">Document.renameNode()</code>, the tag name will be
+      lowercase.)</p>
+
+      <p>For each attribute that the element has, append a U+0020
+      SPACE character, the attribute's name (which, for attributes set
+      by the <span>HTML parser</span> or by <code
+      title="">Element.setAttributeNode()</code> or <code
+      title="">Element.setAttribute()</code>, will be lowercase), a
+      U+003D EQUALS SIGN (<code title="">=</code>) character, a U+0022
+      QUOTATION MARK (<code title="">"</code>) character, the
+      attribute's value, <span title="escaping a string">escaped as
+      described below</span>, and a second U+0022 QUOTATION MARK
+      (<code title="">"</code>) character.</p>
+
+      <p>While the exact order of attributes is UA-defined, and may
+      depend on factors such as the order that the attributes were
+      given in the original markup, the sort order must be stable,
+      such that consecutive invocations of this algorithm serialise an
+      element's attributes in the same order.</p>
+
+      <p>Append a U+003E GREATER-THAN SIGN (<code title="">></code>)
+      character.</p>
+
+      <p>If the child node is an <code>area</code>, <code>base</code>,
+      <code>basefont</code>, <code>bgsound</code>, <code>br</code>,
+      <code>col</code>, <code>embed</code>, <code>frame</code>,
+      <code>hr</code>, <code>img</code>, <code>input</code>,
+      <code>link</code>, <code>meta</code>, <code>param</code>,
+      <code>spacer</code>, or <code>wbr</code> element, then continue
+      on to the next child node at this point.</p> <!-- also, i guess:
+      image, isindex, and keygen, but we don't list those because we
+      don't consider those "elements", more "macros", and thus we
+      should never serialise them --> <!-- XXX when we get around to
+      it, add event-source -->
+
+      <p>If the child node is a <code>pre</code> or
+      <code>textarea</code> element, append a U+000A LINE FEED (LF)
+      character.</p>
+
+      <p>Append the value of running the <span>HTML fragment
+      serialisation algorithm</span> on the <var title="">child</var>
+      element (thus recursing into this algorithm for that element),
+      followed by a U+003C LESS-THAN SIGN (<code title=""><</code>)
+      character, a U+002F SOLIDUS (<code title="">/</code>) character,
+      the element's tag name again, and finally a U+003E GREATER-THAN
+      SIGN (<code title="">></code>) character.</p>
+
+     </dd>
+
+
+     <dt>If the child node is a <code title="">Text</code> or <code
+     title="">CDATASection</code> node</dt>
+
+     <dd>
+
+      <p>If one of the ancestors of the child node is a
+      <code>style</code>, <code>script</code>, <code>xmp</code>,
+      <code>iframe</code>, <code>noembed</code>,
+      <code>noframes</code>, or <code>noscript</code> element, then
+      append the value of the <var title="">child</var> node's <code
+      title="">data</code> DOM attribute literally.</p>
+      <!-- note about noscript: because this is defining an API, it
+      can assume that scripting is enabled, and that thus the
+      <noscript> element in the DOM will have been parsed in the
+      scripting-enabled mode, and that thus the text node is raw
+      markup -->
+
+      <p>Otherwise, append the value of the <var title="">child</var>
+      node's <code title="">data</code> DOM attribute, <span
+      title="escaping a string">escaped as described below</span>.</p>
+
+     </dd>
+
+
+     <dt>If the child node is a <code title="">Comment</code></dt>
+
+     <dd>
+
+      <p>Append the literal string <code><!--</code> (U+003C
+      LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS,
+      U+002D HYPHEN-MINUS), followed by the value of the <var
+      title="">child</var> node's <code title="">data</code> DOM
+      attribute, followed by the literal string <code>--></code>
+      (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN
+      SIGN).</p>
+
+     </dd>
+
+
+     <dt>If the child node is a <code title="">DocumentType</code></dt>
+
+     <dd>
+
+      <p>Append the literal string <code><!DOCTYPE</code> (U+003C
+      LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL
+      LETTER D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL
+      LETTER C, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL
+      LETTER Y, U+0050 LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL
+      LETTER E), followed by a space (U+0020 SPACE), followed by the
+      value of the <var title="">child</var> node's <code
+      title="">name</code> DOM attribute, followed by the literal
+      string <code>></code> (U+003E GREATER-THAN SIGN).</p>
+
+     </dd>
+
+
+    </dl>
+
+    <p>Other nodes types (e.g. <code title="">Attr</code>) cannot
+    occur as children of elements. If they do, this algorithm must
+    raise an <code>INVALID_STATE_ERR</code> exception.</p>
+
+   </li>
+
+   <li><p>The result of the algorithm is the string <var
+   title="">s</var>.</p></li>
+
+  </ol>
+
+  <p><dfn id="escapingString">Escaping a string</dfn> (for the
+  purposes of the algorithm above) consists of replacing any
+  occurances of the "<code title="">&</code>" character by the
+  string "<code title="">&amp;</code>", any occurances of the
+  "<code title=""><</code>" character by the string "<code
+  title="">&lt;</code>", any occurances of the "<code
+  title="">></code>" character by the string "<code
+  title="">&gt;</code>", and any occurances of the "<code
+  title="">"</code>" character by the string "<code
+  title="">&quot;</code>".</p>
+
+  <p class="note">Entity reference nodes are <a
+  href="#entity-references">assumed to be expanded</a> by the user
+  agent, and are therefore not covered in the algorithm above.</p>
+
+  <p class="note">It is possible that the output of this algorithm, if
+  parsed with an <span>HTML parser</span>, will not return the
+  original tree structure. For instance, if a <code>textarea</code>
+  element to which a <code title="">Comment</code> node has been
+  appended is serialised and the output is then reparsed, the comment
+  will end up being displayed in the text field. Similarly, if, as a
+  result of DOM manipulation, an element contains a comment that
+  contains the literal string "<code title="">--></code>", then
+  when the result of serialising the element is parsed, the comment
+  will be truncated at that point and the rest of the comment will be
+  interpreted as markup. More examples would be making a
+  <code>script</code> element contain a text node with the text string
+  "<code></script></code>", or having a <code>p</code> element that
+  contains a <code>ul</code> element (as the <code>ul</code> element's
+  <span title="syntax-start-tag">start tag</span> would imply the end
+  tag for the <code>p</code>).</p>
+
+
   <h3><dfn>Entities</dfn></h3>
 
   <p>This table lists the entity names that are supported by HTML, and




More information about the Commit-Watchers mailing list