[html5] r1910 - [] (0) Make the coercions section not invent a new syntax. (Bug 5808) (credit: hs)

whatwg at whatwg.org whatwg at whatwg.org
Wed Jul 23 01:40:07 PDT 2008


Author: ianh
Date: 2008-07-23 01:40:07 -0700 (Wed, 23 Jul 2008)
New Revision: 1910

Modified:
   index
   source
Log:
[] (0) Make the coercions section not invent a new syntax. (Bug 5808) (credit: hs)

Modified: index
===================================================================
--- index	2008-07-23 07:41:48 UTC (rev 1909)
+++ index	2008-07-23 08:40:07 UTC (rev 1910)
@@ -51014,122 +51014,80 @@
    is not compatible with the XML tool chain in certain subtle ways. For
    example, an XML toolchain might not be able to represent attributes with
    the name <code title="">xmlns</code>, since they conflict with the
-   Namespaces in XML syntax. <a href="#refsXMLNS">[XMLNS]</a>
+   Namespaces in XML syntax. There is also some data that the <a
+   href="#html-0">HTML parser</a> generates that isn't included in the DOM
+   itself. This section specifies some rules for handling these issues.
 
-  <p>There is also some data that the <a href="#html-0">HTML parser</a>
-   generates that isn't included in the DOM itself.
+  <p>If the XML API being used doesn't support DOCTYPEs, tools may drop
+   DOCTYPEs altogether.
 
-  <p>To allow tools to apply a consistent set of adjustments to the output of
-   their <a href="#html-0">HTML parser</a> to allow for compatibility with
-   the rest of their XML toolchain, this section documents a set of mutations
-   and conventions that will convert the output of the <a href="#html-0">HTML
-   parser</a> for any arbitrary input into an XML Infoset that doesn't have
-   any problematic characteristics.
+  <p>If the XML API doesn't support attributes in no namespace that are named
+   "<code title="">xmlns</code>", attributes whose names start with "<code
+   title="">xmlns:</code>", or attributes in the <a href="#xmlns">XMLNS
+   namespace</a>, then the tool may drop such attributes.
 
-  <p>Tools that cannot convey the out-of-band information using out-of-band
-   mechanisms, or that cannot convey the DOM exactly as prescribed by this
-   specification, may either ignore the offending information or DOM feature,
-   or may represent it internally in the DOM using the conventions described
-   below.
+  <p>The tool may annotate the output with any namespace declarations
+   required for proper operation.
 
-  <p>These conventions are not conforming HTML, and user agents must not
-   output such syntax outside of their XML pipeline.
+  <p>If the XML API being used restricts the allowable characters in the
+   local names of elements and attributes, then the tool may map all element
+   and attribute local names that the API wouldn't support to a set of names
+   that <em>are</em> allowed, by replacing any character that isn't supported
+   with the upper case letter U and the five digits of the character's
+   Unicode codepoint when expressed in hexadecimal.
 
-  <dl>
-   <dt>The <code>DocumentType</code> node's <code title="">name</code>, <code
-    title="">publicId</code>, and <code title="">systemId</code> attributes
+  <p class=example>For example, the element name <code
+   title="">.foo<bar</code>, which can be output by the <a
+   href="#html-0">HTML parser</a>, though it is neither a legal HTML element
+   name nor a well-formed XML element name, would be converted into <code
+   title="">U0002EfooU0003Cbar</code>, which <em>is</em> a well-formed XML
+   element name (though it's still not legal in HTML by any means).
 
-   <dd>If the XML API being used doesn't support DOCTYPEs, tools may drop
-    DOCTYPEs altogether or create a set of three attributes on the root
-    element, named <code title="">__doctype_name__</code>, <code
-    title="">__doctype_publicid__</code>, and <code
-    title="">__doctype_systemid__</code>, respectively, whose values are the
-    values that would have been put on the <code>DocumentType</code> node.
+  <p class=example>As another example, consider the attribute
+   <code>xlink:href</code>. Used on a MathML element, it becomes, after being
+   <span title="adjust foreign attributes</span>, an attribute with a prefix
+   "><code title="">xlink</code>" and a local name "<code
+   title="">href</code>". However, used on an HTML element, it becomes an
+   attribute with no prefix and the local name "<code
+   title="">xlink:href</code>", which is not a valid NCName, and thus might
+   not be accepted by an XML API. It could thus get converted, becoming
+   "<code title="">xlinkU0003Ahref</code>".</span>
 
-   <dt>The document being set to <i><a href="#no-quirks">no quirks
-    mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or
-    <i><a href="#quirks">quirks mode</a></i>
+  <p class=note>The resulting names from this conversion conveniently can't
+   clash with any attribute generated by the <a href="#html-0">HTML
+   parser</a>, since those are all either lowercase or those listed in the <a
+   href="#adjust">adjust foreign attributes</a> algorithm's table.
 
-   <dd>To convey this information, create an attribute <code
-    title="">__mode__</code> on the root element, with values "noquirks",
-    "limitedquirks", or "quirks" respectively.
+  <p>If the XML API restricts comments from having two consecutive U+002D
+   HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE
+   character between any such offending characters.
 
-   <dt>Elements that have a namespace without appropriate <code
-    title="">xmlns</code> attributes being in scope
+  <p>If the XML API restricts allowed characters in character data, the tool
+   may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE
+   character, and any other literal non-XML character with a U+FFFD
+   REPLACEMENT CHARACTER.
 
-   <dd>Construct the DOM as if appropriate namespace declarations were in
-    scope.
+  <p>If the tool has no way to convey out-of-band information, then the tool
+   may drop the following information:
 
-   <dt>Elements whose names contain U+003A COLON (:) characters or characters
-    that cannot be represented in XML element names
+  <ul>
+   <li>Whether the document is set to <i><a href="#no-quirks">no quirks
+    mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or
+    <i><a href="#quirks">quirks mode</a></i>
 
-   <dd>Drop the element and all its children, or replace any offending
-    characters with a U+005F LOW LINE (_) character.
+   <li>The association between form controls and forms that aren't their
+    nearest <code>form</code> element ancestor (use of the <a
+    href="#form-element"><code>form</code> element pointer</a> in the parser)
+  </ul>
 
-   <dt>Attributes named <code title="">xmlns</code> whose values match the
-    namespace of the element node
+  <p class=note>The mutatiosn allowed by this section apply <em>after</em>
+   the <a href="#html-0">HTML parser</a>'s rules have been applied. For
+   example, a <code title=""><a::></code> start tag will be closed by a
+   <code title=""></a::></code> end tag, and never by a <code
+   title=""></aU0003AU0003A></code> end tag, even if the user agent is
+   using the rules above to then generate an actual element in the DOM with
+   the name <code title="">aU0003AU0003A</code> for that start tag.
 
-   <dd>Construct the DOM as if these were default namespace declarations.
-
-   <dt>Attributes named <code title="">xmlns:xlink</code> whose values match
-    the <a href="#xlink">XLink namespace</a>, on elements whose namespace is
-    not the <a href="#html-namespace0">HTML namespace</a>
-
-   <dd>Construct the DOM as if these were namespace prefix declarations.
-
-   <dt>Other attributes whose names are <code title="">xmlns</code> or start
-    with <code title="">xmlns:</code>
-
-   <dd>Drop the attributes or add two U+005F LOW LINE (_) characters to the
-    start of the attributes' names and replace any U+003A COLON (:)
-    characters with a U+005F LOW LINE (_) character.
-
-   <dt>Other attributes in no namespace whose names contain U+003A COLON (:)
-    characters
-
-   <dt>Attributes whose names contain characters that cannot be represented
-    in XML attribute names
-
-   <dd>Drop the attributes or replace any offending characters with a U+005F
-    LOW LINE (_) character, dropping any attributes where doing this would
-    cause an attribute name clash.
-
-   <dt>Form controls associated with forms that aren't their nearest ancestor
-    (use of the <a href="#form-element"><code>form</code> element
-    pointer</a>)
-
-   <dd>Create an attribute <code title="">__formid__</code> on the form, with
-    a value unique amongst <code title="">__formid__</code> attributes in the
-    document, and create an attribute <code title="">__form__</code> on the
-    form control, whose value matches the unique identifier given to the
-    form.
-
-   <dt>Any U+000C FORM FEED (FF) character
-
-   <dd>Replace the character with a U+0020 SPACE character.
-
-   <dt>Any other literal non-XML character
-
-   <dd>Replace the character with a U+FFFD REPLACEMENT CHARACTER.
-
-   <dt>A comment that contains two adjacent U+002D HYPHEN-MINUS characters
-    (--).
-
-   <dd>Insert a U+0020 SPACE character between them.
-  </dl>
-
-  <p>Tools that use these conventions should guard against documents that
-   include markup that clashes with them by always dropping all attributes in
-   the document that start with two U+005F LOW LINE (_) characters.
-
-  <p class=note>These conventions apply <em>after</em> the <a
-   href="#html-0">HTML parser</a>'s rules have been applied. For example, a
-   <code title=""><a::></code> start tag will be closed by a <code
-   title=""></a::></code> end tag, and never by a <code
-   title=""></a__></code> end tag, even if the user agent is using the
-   rules above to then generate an actual element in the DOM with the name
-   <code title="">a__</code> for that start tag.
-
   <h3 id=namespaces><span class=secno>8.3 </span>Namespaces</h3>
 
   <p>The <dfn id=html-namespace0>HTML namespace</dfn> is:

Modified: source
===================================================================
--- source	2008-07-23 07:41:48 UTC (rev 1909)
+++ source	2008-07-23 08:40:07 UTC (rev 1910)
@@ -48102,141 +48102,89 @@
   constructed DOM is not compatible with the XML tool chain in certain
   subtle ways. For example, an XML toolchain might not be able to
   represent attributes with the name <code title="">xmlns</code>,
-  since they conflict with the Namespaces in XML syntax. <a
-  href="#refsXMLNS">[XMLNS]</a></p>
+  since they conflict with the Namespaces in XML syntax. There is also
+  some data that the <span>HTML parser</span> generates that isn't
+  included in the DOM itself. This section specifies some rules for
+  handling these issues.</p>
 
-  <p>There is also some data that the <span>HTML parser</span>
-  generates that isn't included in the DOM itself.</p>
+  <p>If the XML API being used doesn't support DOCTYPEs, tools may
+  drop DOCTYPEs altogether.</dd>
 
-  <p>To allow tools to apply a consistent set of adjustments to the
-  output of their <span>HTML parser</span> to allow for compatibility
-  with the rest of their XML toolchain, this section documents a set
-  of mutations and conventions that will convert the output of the
-  <span>HTML parser</span> for any arbitrary input into an XML Infoset
-  that doesn't have any problematic characteristics.</p>
+  <p>If the XML API doesn't support attributes in no namespace that
+  are named "<code title="">xmlns</code>", attributes whose names
+  start with "<code title="">xmlns:</code>", or attributes in the
+  <span>XMLNS namespace</span>, then the tool may drop such
+  attributes.</p>
 
-  <p>Tools that cannot convey the out-of-band information using
-  out-of-band mechanisms, or that cannot convey the DOM exactly as
-  prescribed by this specification, may either ignore the offending
-  information or DOM feature, or may represent it internally in the
-  DOM using the conventions described below.</p>
+  <p>The tool may annotate the output with any namespace declarations
+  required for proper operation.</p>
 
-  <p>These conventions are not conforming HTML, and user agents must
-  not output such syntax outside of their XML pipeline.</p>
+  <p>If the XML API being used restricts the allowable characters in
+  the local names of elements and attributes, then the tool may map
+  all element and attribute local names that the API wouldn't support
+  to a set of names that <em>are</em> allowed, by replacing any
+  character that isn't supported with the upper case letter U and the
+  five digits of the character's Unicode codepoint when expressed in
+  hexadecimal.</p>
 
-  <dl>
+  <p class="example">For example, the element name <code
+  title="">.foo<bar</code>, which can be output by the <span>HTML
+  parser</span>, though it is neither a legal HTML element name nor a
+  well-formed XML element name, would be converted into <code
+  title="">U0002EfooU0003Cbar</code>, which <em>is</em> a well-formed
+  XML element name (though it's still not legal in HTML by any
+  means).</p>
 
-   <dt>The <code>DocumentType</code> node's <code
-   title="">name</code>, <code title="">publicId</code>, and <code
-   title="">systemId</code> attributes</dt>
+  <p class="example">As another example, consider the attribute
+  <code>xlink:href</code>. Used on a MathML element, it becomes, after
+  being <span title="adjust foreign attributes</span>, an attribute
+  with a prefix "<code title="">xlink</code>" and a local name "<code
+  title="">href</code>". However, used on an HTML element, it becomes
+  an attribute with no prefix and the local name "<code
+  title="">xlink:href</code>", which is not a valid NCName, and thus
+  might not be accepted by an XML API. It could thus get converted,
+  becoming "<code title="">xlinkU0003Ahref</code>".</p>
 
-   <dd>If the XML API being used doesn't support DOCTYPEs, tools may
-   drop DOCTYPEs altogether or create a set of three attributes on the
-   root element, named <code title="">__doctype_name__</code>, <code
-   title="">__doctype_publicid__</code>, and <code
-   title="">__doctype_systemid__</code>, respectively, whose values
-   are the values that would have been put on the
-   <code>DocumentType</code> node.</dd>
+  <p class="note">The resulting names from this conversion
+  conveniently can't clash with any attribute generated by the
+  <span>HTML parser</span>, since those are all either lowercase or
+  those listed in the <span>adjust foreign attributes</span>
+  algorithm's table.</p>
 
+  <p>If the XML API restricts comments from having two consecutive
+  U+002D HYPHEN-MINUS characters (--), the tool may insert a single
+  U+0020 SPACE character between any such offending characters.</p>
 
-   <dt>The document being set to <i>no quirks mode</i>, <i>limited
-   quirks mode</i>, or <i>quirks mode</i></dt>
+  <p>If the XML API restricts allowed characters in character data,
+  the tool may replace any U+000C FORM FEED (FF) character with a
+  U+0020 SPACE character, and any other literal non-XML character with
+  a U+FFFD REPLACEMENT CHARACTER.</p>
 
-   <dd>To convey this information, create an attribute <code
-   title="">__mode__</code> on the root element, with values
-   "noquirks", "limitedquirks", or "quirks" respectively.</dd>
+  <p>If the tool has no way to convey out-of-band information, then
+  the tool may drop the following information:</p>
 
+  <ul>
 
-   <dt>Elements that have a namespace without appropriate <code
-   title="">xmlns</code> attributes being in scope</dt>
+   <li>Whether the document is set to <i>no quirks mode</i>,
+   <i>limited quirks mode</i>, or <i>quirks mode</i></li>
 
-   <dd>Construct the DOM as if appropriate namespace declarations were
-   in scope.</dd>
+   <li>The association between form controls and forms that aren't
+   their nearest <code>form</code> element ancestor (use of the
+   <span><code>form</code> element pointer</span> in the parser)</li>
 
+  </ul>
 
-   <dt>Elements whose names contain U+003A COLON (:) characters or
-   characters that cannot be represented in XML element names</dt>
+  <p class="note">The mutatiosn allowed by this section apply
+  <em>after</em> the <span>HTML parser</span>'s rules have been
+  applied. For example, a <code title=""><a::></code> start tag
+  will be closed by a <code title=""></a::></code> end tag, and
+  never by a <code title=""></aU0003AU0003A></code> end tag, even
+  if the user agent is using the rules above to then generate an
+  actual element in the DOM with the name <code
+  title="">aU0003AU0003A</code> for that start tag.</p>
 
-   <dd>Drop the element and all its children, or replace any offending
-   characters with a U+005F LOW LINE (_) character.</dd>
 
 
-   <dt>Attributes named <code title="">xmlns</code> whose values match
-   the namespace of the element node</dt>
-
-   <dd>Construct the DOM as if these were default namespace
-   declarations.</dd>
-
-
-   <dt>Attributes named <code title="">xmlns:xlink</code> whose values
-   match the <span>XLink namespace</span>, on elements whose namespace
-   is not the <span>HTML namespace</span></dt>
-
-   <dd>Construct the DOM as if these were namespace prefix
-   declarations.</dd>
-
-
-   <dt>Other attributes whose names are <code title="">xmlns</code> or
-   start with <code title="">xmlns:</code></dt>
-
-   <dd>Drop the attributes or add two U+005F LOW LINE (_) characters
-   to the start of the attributes' names and replace any U+003A COLON
-   (:) characters with a U+005F LOW LINE (_) character.</dd>
-
-
-   <dt>Other attributes in no namespace whose names contain U+003A
-   COLON (:) characters</dt>
-   <dt>Attributes whose names contain characters that cannot be
-   represented in XML attribute names</dt>
-
-   <dd>Drop the attributes or replace any offending characters with a
-   U+005F LOW LINE (_) character, dropping any attributes where doing
-   this would cause an attribute name clash.</dd>
-
-
-   <dt>Form controls associated with forms that aren't their
-   nearest ancestor (use of the <span><code>form</code> element
-   pointer</span>)</dt>
-
-   <dd>Create an attribute <code title="">__formid__</code> on the
-   form, with a value unique amongst <code title="">__formid__</code>
-   attributes in the document, and create an attribute <code
-   title="">__form__</code> on the form control, whose value matches
-   the unique identifier given to the form.</dd>
-
-
-   <dt>Any U+000C FORM FEED (FF) character</dt>
-
-   <dd>Replace the character with a U+0020 SPACE character.</dd>
-
-
-   <dt>Any other literal non-XML character</dt>
-
-   <dd>Replace the character with a U+FFFD REPLACEMENT CHARACTER.</dd>
-
-
-   <dt>A comment that contains two adjacent U+002D HYPHEN-MINUS
-   characters (--).</dt>
-
-   <dd>Insert a U+0020 SPACE character between them.</dd>
-
-  </dl>
-
-  <p>Tools that use these conventions should guard against documents
-  that include markup that clashes with them by always dropping all
-  attributes in the document that start with two U+005F LOW LINE (_)
-  characters.</p>
-
-  <p class="note">These conventions apply <em>after</em> the
-  <span>HTML parser</span>'s rules have been applied. For example, a
-  <code title=""><a::></code> start tag will be closed by a <code
-  title=""></a::></code> end tag, and never by a <code
-  title=""></a__></code> end tag, even if the user agent is using
-  the rules above to then generate an actual element in the DOM with
-  the name <code title="">a__</code> for that start tag.</p>
-
-
-
   <h3>Namespaces</h3>
 
   <p>The <dfn>HTML namespace</dfn> is: <code>http://www.w3.org/1999/xhtml</code></p>




More information about the Commit-Watchers mailing list