[html5] r2057 - [] (0) Define the Content-Language pragma, since apparently ~1% of sites use it [...]

Tue Aug 12 03:02:04 PDT 2008

Author: ianh
Date: 2008-08-12 03:02:04 -0700 (Tue, 12 Aug 2008)
New Revision: 2057

Modified:
   index
   source
Log:
[] (0) Define the Content-Language pragma, since apparently ~1% of sites use it in some way or another.

Modified: index
===================================================================

--- index	2008-08-12 09:32:29 UTC (rev 2056)
+++ index	2008-08-12 10:02:04 UTC (rev 2057)
@@ -7966,15 +7966,15 @@
   <!-- technically this is redundant
   with the XML spec -->
 
+  <hr>
+
   <p>To determine the language of a node, user agents must look at the
    nearest ancestor element (including the element itself if the node is an
    element) that has an <code title=attr-xml-lang><a
    href="#xmllang">xml:lang</a></code> attribute set or is an <a
    href="#html-elements" title="HTML elements">HTML element</a> and has a
    <code title=attr-lang><a href="#lang">lang</a></code> attribute set. That
-   attribute specifies the language of the node. If that attribute's value is
-   not a recognised language code, then it must be treated as an unknown
-   language (as if the value was the empty string).
+   attribute specifies the language of the node.
 
   <p>If both the <code title=attr-xml-lang><a
    href="#xmllang">xml:lang</a></code> attribute and the <code
@@ -7986,11 +7986,20 @@
    the element's language.
 
   <p>If no explicit language is given for the <a href="#root-element">root
-   element</a>, then language information from a higher-level protocol (such
+   element</a>, but there is a <a href="#document-wide">document-wide default
+   language</a> set, then that is the language of the node.
+
+  <p>If there is no <a href="#document-wide">document-wide default
+   language</a>, then language information from a higher-level protocol (such
    as HTTP), if any, must be used as the final fallback language. In the
    absence of any language information, the default value is unknown (the
    empty string).
 
+  <p>If the resulting value is not a recognised language code, then it must
+   be treated as an unknown language (as if the value was the empty string).
+
+  <hr>
+
   <p>User agents may use the element's language to determine proper
    processing or rendering (e.g. in the selection of appropriate fonts or
    pronunciations, or for dictionary selection). <!--User
@@ -8873,7 +8882,7 @@
      tokeniser had emitted a start tag token with the tag name "pre", then
      set the <a href="#html-0">HTML parser</a>'s <a
      href="#tokenization0">tokenization</a> stage's <a
-     href="#content3">content model flag</a> to <em>PLAINTEXT</em>.
+     href="#content4">content model flag</a> to <em>PLAINTEXT</em>.
 
    <li>
     <p>If <var title="">replace</var> is false, then:
@@ -10200,7 +10209,8 @@
    keywords defined for this attribute. The states given in the first cell of
    the rows with keywords give the states to which those keywords
    map.<!-- Some of the keywords are non-conforming, as
-  noted in the last column.-->
+  noted in the last column.--></p>
+  <!-- things that are neither conforming nor do anything are commented out -->
 
   <table>
    <thead>
@@ -10209,12 +10219,13 @@
 
      <th>Keywords <!--     <th>Notes-->
 
-   <tbody><!-- things that are neither conforming nor do anything are commented out
+   <tbody>
     <tr>
-     <td><span title="attr-meta-http-equiv-content-language">Content-Language</span>
+     <td><a href="#content3"
+      title=attr-meta-http-equiv-content-language>Content Language</a>
+
      <td><code title="">Content-Language</code>
-     <td>Non-conforming [ XXX but maybe we should make this an alternative to <html lang="">? ]
--->
+      <!--     <td>Non-conforming -->
 
     <tr>
      <td><a href="#encoding" title=attr-meta-http-equiv-content-type>Encoding
@@ -10291,6 +10302,62 @@
    algorithm appropriate for that state, as described in the following list:
 
   <dl>
+   <dt><dfn id=content3 title=attr-meta-http-equiv-content-language>Content
+    language</dfn>
+
+   <dd>
+    <p>This pragma sets the <dfn id=document-wide>document-wide default
+     language</dfn>. Until the pragma is successfully processed, there is no
+     <a href="#document-wide">document-wide default language</a>.</p>
+
+    <ol>
+     <li>
+      <p>If another <code><a href="#meta0">meta</a></code> element in the <a
+       href="#content3" title=attr-meta-http-equiv-content-language>Content
+       Language state</a> has already been successfully processed (i.e. when
+       it was inserted the user agent processed it and reached the last step
+       of this list of steps), then abort these steps.
+
+     <li>
+      <p>If the <code><a href="#meta0">meta</a></code> element has no <code
+       title=attr-meta-content><a href="#content1">content</a></code>
+       attribute, or if that attribute's value is the empty string, then
+       abort these steps.
+
+     <li>
+      <p>Let <var title="">input</var> be the value of the element's <code
+       title=attr-meta-content><a href="#content1">content</a></code>
+       attribute.
+
+     <li>
+      <p>Let <var title="">position</var> point at the first character of
+       <var title="">input</var>.
+
+     <li>
+      <p><a href="#skip-whitespace">Skip whitespace</a>.
+
+     <li>
+      <p><a href="#collect" title="collect a sequence of characters">Collect
+       a sequence of characters</a> that are neither <a href="#space"
+       title="space character">space characters</a> nor a U+002C COMMA
+       character (",").
+
+     <li>
+      <p>Let the <a href="#document-wide">document-wide default language</a>
+       be the string that resulted from the previous step.
+    </ol>
+
+    <p>For <code><a href="#meta0">meta</a></code> elements in the <a
+     href="#content3" title=attr-meta-http-equiv-content-language>Content
+     Language state</a>, the <code title=attr-meta-content><a
+     href="#content1">content</a></code> attribute must have a value
+     consisting of a valid RFC 3066 language code. <a
+     href="#refsRFC3066">[RFC3066]</a></p>
+
+    <p class=note>This pragma not exactly equivalent to the HTTP
+     <code>Content-Language</code> header, for instance it only supports one
+     language. <a href="#refsRFC2616">[RFC2616]</a></p>
+
    <dt><dfn id=encoding title=attr-meta-http-equiv-content-type>Encoding
     declaration state</dfn>
 
@@ -36432,7 +36499,7 @@
    title="HTML documents">HTML document</a>, create an <a href="#html-0">HTML
    parser</a>, associate it with the document, act as if the tokeniser had
    emitted a start tag token with the tag name "pre", set the <a
-   href="#tokenization0">tokenization</a> stage's <a href="#content3">content
+   href="#tokenization0">tokenization</a> stage's <a href="#content4">content
    model flag</a> to <i>PLAINTEXT</i>, and begin to pass the stream of
    characters in the plain text document to that tokeniser.
 
@@ -46624,7 +46691,7 @@
    to another state.
 
   <p>The exact behavior of certain states depends on a <dfn
-   id=content3>content model flag</dfn> that is set after certain tokens are
+   id=content4>content model flag</dfn> that is set after certain tokens are
    emitted. The flag has several states: <i title="">PCDATA</i>, <i
    title="">RCDATA</i>, <i title="">CDATA</i>, and <i title="">PLAINTEXT</i>.
    Initially it must be in the PCDATA state. In the RCDATA and CDATA states,
@@ -46648,7 +46715,7 @@
 
   <p>When a token is emitted, it must immediately be handled by the <a
    href="#tree-construction0">tree construction</a> stage. The tree
-   construction stage can affect the state of the <a href="#content3">content
+   construction stage can affect the state of the <a href="#content4">content
    model flag</a>, and can insert additional characters into the stream. (For
    example, the <code><a href="#script1">script</a></code> element can result
    in scripts executing and using the <a href="#dynamic3">dynamic markup
@@ -46659,7 +46726,7 @@
    flag">acknowledged</dfn> when it is processed by the tree construction
    stage, that is a <a href="#parse2">parse error</a>.
 
-  <p>When an end tag token is emitted, the <a href="#content3">content model
+  <p>When an end tag token is emitted, the <a href="#content4">content model
    flag</a> must be switched to the PCDATA state.
 
   <p>When an end tag token is emitted with attributes, that is a <a
@@ -46690,7 +46757,7 @@
   <dl class=switch>
    <dt>U+0026 AMPERSAND (&)
 
-   <dd>When the <a href="#content3">content model flag</a> is set to one of
+   <dd>When the <a href="#content4">content model flag</a> is set to one of
     the PCDATA or RCDATA states and the <a href="#escape">escape flag</a> is
     false: switch to the <a href="#character6">character reference data
     state</a>.
@@ -46700,7 +46767,7 @@
    <dt>U+002D HYPHEN-MINUS (-)
 
    <dd>
-    <p>If the <a href="#content3">content model flag</a> is set to either the
+    <p>If the <a href="#content4">content model flag</a> is set to either the
      RCDATA state or the CDATA state, and the <a href="#escape">escape
      flag</a> is false, and there are at least three characters before this
      one in the input stream, and the last four characters in the input
@@ -46713,10 +46780,10 @@
 
    <dt>U+003C LESS-THAN SIGN (<)
 
-   <dd>When the <a href="#content3">content model flag</a> is set to the
+   <dd>When the <a href="#content4">content model flag</a> is set to the
     PCDATA state: switch to the <a href="#tag-open0">tag open state</a>.
 
-   <dd>When the <a href="#content3">content model flag</a> is set to either
+   <dd>When the <a href="#content4">content model flag</a> is set to either
     the RCDATA state or the CDATA state and the <a href="#escape">escape
     flag</a> is false: switch to the <a href="#tag-open0">tag open state</a>.
 
@@ -46725,7 +46792,7 @@
    <dt>U+003E GREATER-THAN SIGN (>)
 
    <dd>
-    <p>If the <a href="#content3">content model flag</a> is set to either the
+    <p>If the <a href="#content4">content model flag</a> is set to either the
      RCDATA state or the CDATA state, and the <a href="#escape">escape
      flag</a> is true, and the last three characters in the input stream
      including this one are U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E
@@ -46752,7 +46819,7 @@
   <h5 id=character1><span class=secno>8.2.4.2. </span><dfn
    id=character6>Character reference data state</dfn></h5>
 
-  <p><em>(This cannot happen if the <a href="#content3">content model
+  <p><em>(This cannot happen if the <a href="#content4">content model
    flag</a> is set to the CDATA state.)</em>
 
   <p>Attempt to <a href="#consume">consume a character reference</a>, with no
@@ -46767,11 +46834,11 @@
   <h5 id=tag-open><span class=secno>8.2.4.3. </span><dfn id=tag-open0>Tag
    open state</dfn></h5>
 
-  <p>The behavior of this state depends on the <a href="#content3">content
+  <p>The behavior of this state depends on the <a href="#content4">content
    model flag</a>.
 
   <dl>
-   <dt>If the <a href="#content3">content model flag</a> is set to the RCDATA
+   <dt>If the <a href="#content4">content model flag</a> is set to the RCDATA
     or CDATA states
 
    <dd>
@@ -46781,7 +46848,7 @@
      and reconsume the current input character in the <a
      href="#data-state0">data state</a>.</p>
 
-   <dt>If the <a href="#content3">content model flag</a> is set to the PCDATA
+   <dt>If the <a href="#content4">content model flag</a> is set to the PCDATA
     state
 
    <dd>
@@ -46834,10 +46901,10 @@
   <h5 id=close><span class=secno>8.2.4.4. </span><dfn id=close4>Close tag
    open state</dfn></h5>
 
-  <p>If the <a href="#content3">content model flag</a> is set to the RCDATA
+  <p>If the <a href="#content4">content model flag</a> is set to the RCDATA
    or CDATA states but no start tag token has ever been emitted by this
    instance of the tokeniser (<a href="#fragment">fragment case</a>), or, if
-   the <a href="#content3">content model flag</a> is set to the RCDATA or
+   the <a href="#content4">content model flag</a> is set to the RCDATA or
    CDATA states and the next few characters do not match the tag name of the
    last start tag token emitted (compared in an <span>ASCII case
    insensitive</span> manner), or if they do but they are not immediately
@@ -46864,7 +46931,7 @@
    character token, and switch to the <a href="#data-state0">data state</a>
    to process the <a href="#next-input">next input character</a>.
 
-  <p>Otherwise, if the <a href="#content3">content model flag</a> is set to
+  <p>Otherwise, if the <a href="#content4">content model flag</a> is set to
    the PCDATA state, or if the next few characters <em>do</em> match that tag
    name, consume the <a href="#next-input">next input character</a>:
 
@@ -47346,7 +47413,7 @@
   <h5 id=bogus><span class=secno>8.2.4.16. </span><dfn id=bogus1>Bogus
    comment state</dfn></h5>
 
-  <p><em>(This can only happen if the <a href="#content3">content model
+  <p><em>(This can only happen if the <a href="#content4">content model
    flag</a> is set to the PCDATA state.)</em>
 
   <p>Consume every character up to and including the first U+003E
@@ -47365,7 +47432,7 @@
   <h5 id=markup><span class=secno>8.2.4.17. </span><dfn id=markup0>Markup
    declaration open state</dfn></h5>
 
-  <p><em>(This can only happen if the <a href="#content3">content model
+  <p><em>(This can only happen if the <a href="#content4">content model
    flag</a> is set to the PCDATA state.)</em>
 
   <p>If the next two characters are both U+002D HYPHEN-MINUS (-) characters,
@@ -47385,7 +47452,7 @@
    (the five uppercase letters "CDATA" with a U+005B LEFT SQUARE BRACKET
    character before and after), then consume those characters and switch to
    the <a href="#cdata2">CDATA section state</a> (which is unrelated to the
-   <a href="#content3">content model flag</a>'s CDATA state).
+   <a href="#content4">content model flag</a>'s CDATA state).
 
   <p>Otherwise, this is a <a href="#parse2">parse error</a>. Switch to the <a
    href="#bogus1">bogus comment state</a>. The next character that is
@@ -47995,9 +48062,9 @@
   <h5 id=cdata0><span class=secno>8.2.4.36. </span><dfn id=cdata2>CDATA
    section state</dfn></h5>
 
-  <p><em>(This can only happen if the <a href="#content3">content model
+  <p><em>(This can only happen if the <a href="#content4">content model
    flag</a> is set to the PCDATA state, and is unrelated to the <a
-   href="#content3">content model flag</a>'s CDATA state.)</em>
+   href="#content4">content model flag</a>'s CDATA state.)</em>
 
   <p>Consume every character up to the next occurrence of the three character
    sequence U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE BRACKET U+003E
@@ -48710,10 +48777,10 @@
    <li>
     <p>If the algorithm that was invoked is the <a href="#generic">generic
      CDATA element parsing algorithm</a>, switch the tokeniser's <a
-     href="#content3">content model flag</a> to the CDATA state; otherwise
+     href="#content4">content model flag</a> to the CDATA state; otherwise
      the algorithm invoked was the <a href="#generic0">generic RCDATA element
      parsing algorithm</a>, switch the tokeniser's <a
-     href="#content3">content model flag</a> to the RCDATA state.
+     href="#content4">content model flag</a> to the RCDATA state.
 
    <li>
     <p>Then, collect all the character tokens that the tokeniser returns
@@ -48726,7 +48793,7 @@
      all those tokens' characters, to the new element node.
 
    <li>
-    <p>The tokeniser's <a href="#content3">content model flag</a> will have
+    <p>The tokeniser's <a href="#content4">content model flag</a> will have
      switched back to the PCDATA state.
 
    <li>
@@ -49358,7 +49425,7 @@
      script will execute in-line, instead of blowing the document away, as
      would happen in most other cases.</p>
 
-    <p>Switch the tokeniser's <a href="#content3">content model flag</a> to
+    <p>Switch the tokeniser's <a href="#content4">content model flag</a> to
      the CDATA state.</p>
 
     <p>Then, collect all the character tokens that the tokeniser returns
@@ -49370,7 +49437,7 @@
      href="#script1">script</a></code> element node whose contents is the
      concatenation of all those tokens' characters.</p>
 
-    <p>The tokeniser's <a href="#content3">content model flag</a> will have
+    <p>The tokeniser's <a href="#content4">content model flag</a> will have
      switched back to the PCDATA state.</p>
 
     <p>If the next token is not an end tag token with the tag name "script",
@@ -49941,13 +50008,13 @@
 
     <p><a href="#insert0">Insert an HTML element</a> for the token.</p>
 
-    <p>Switch the <a href="#content3">content model flag</a> to the PLAINTEXT
+    <p>Switch the <a href="#content4">content model flag</a> to the PLAINTEXT
      state.</p>
 
     <p class=note>Once a start tag with the tag name "plaintext" has been
      seen, that will be the last token ever seen other than character tokens
      (and the end-of-file token), because there is no way to switch the <a
-     href="#content3">content model flag</a> out of the PLAINTEXT state.</p>
+     href="#content4">content model flag</a> out of the PLAINTEXT state.</p>
    </dd>
    <!-- end tags for non-phrasing flow content elements -->
    <!-- the normal ones -->
@@ -50576,7 +50643,7 @@
      <code>form</code> element pointed to by the <a
      href="#form-element"><code title="">form</code> element pointer</a>.</p>
 
-    <p>Switch the tokeniser's <a href="#content3">content model flag</a> to
+    <p>Switch the tokeniser's <a href="#content4">content model flag</a> to
      the RCDATA state.</p>
 
     <p>If the next token is a U+000A LINE FEED (LF) character token, then
@@ -50591,7 +50658,7 @@
      single <code>Text</code> node, whose contents is the concatenation of
      all those tokens' characters, to the new element node.</p>
 
-    <p>The tokeniser's <a href="#content3">content model flag</a> will have
+    <p>The tokeniser's <a href="#content4">content model flag</a> will have
      switched back to the PCDATA state.</p>
 
     <p>If the next token is an end tag token with the tag name "textarea",
@@ -52504,14 +52571,14 @@
    <li>
     <p>Set the <a href="#html-0">HTML parser</a>'s <a
      href="#tokenization0">tokenization</a> stage's <a
-     href="#content3">content model flag</a> according to the <var
+     href="#content4">content model flag</a> according to the <var
      title="">context</var> element, as follows:</p>
 
     <dl class=switch>
      <dt>If it is a <code><a href="#title1">title</a></code> or
       <code>textarea</code> element
 
-     <dd>Set the <a href="#content3">content model flag</a> to the RCDATA
+     <dd>Set the <a href="#content4">content model flag</a> to the RCDATA
       state.
 
      <dt>If it is a <code><a href="#style1">style</a></code>, <code><a
@@ -52519,23 +52586,23 @@
       href="#iframe">iframe</a></code>, <code>noembed</code>, or
       <code>noframes</code> element
 
-     <dd>Set the <a href="#content3">content model flag</a> to the CDATA
+     <dd>Set the <a href="#content4">content model flag</a> to the CDATA
       state.
 
      <dt>If it is a <code><a href="#noscript">noscript</a></code> element
 
      <dd>If the <a href="#scripting3">scripting flag</a> is enabled, set the
-      <a href="#content3">content model flag</a> to the CDATA state.
-      Otherwise, set the <a href="#content3">content model flag</a> to the
+      <a href="#content4">content model flag</a> to the CDATA state.
+      Otherwise, set the <a href="#content4">content model flag</a> to the
       PCDATA state.
 
      <dt>If it is a <code>plaintext</code> element
 
-     <dd>Set the <a href="#content3">content model flag</a> to PLAINTEXT.
+     <dd>Set the <a href="#content4">content model flag</a> to PLAINTEXT.
 
      <dt>Otherwise
 
-     <dd>Set the <a href="#content3">content model flag</a> to the PCDATA
+     <dd>Set the <a href="#content4">content model flag</a> to the PCDATA
       state.
     </dl>
 

Modified: source
===================================================================
--- source	2008-08-12 09:32:29 UTC (rev 2056)
+++ source	2008-08-12 10:02:04 UTC (rev 2057)
@@ -5814,15 +5814,15 @@
   <span>HTML documents</span>.</p> <!-- technically this is redundant
   with the XML spec -->
 
+  <hr>
+
   <p>To determine the language of a node, user agents must look at the
   nearest ancestor element (including the element itself if the node
   is an element) that has an <code
   title="attr-xml-lang">xml:lang</code> attribute set or is an <span
   title="HTML elements">HTML element</span> and has a <code
   title="attr-lang">lang</code> attribute set. That attribute
-  specifies the language of the node. If that attribute's value is not
-  a recognised language code, then it must be treated as an unknown
-  language (as if the value was the empty string).</p>
+  specifies the language of the node.</p>
 
   <p>If both the <code title="attr-xml-lang">xml:lang</code> attribute
   and the <code title="attr-lang">lang</code> attribute are set on an
@@ -5833,11 +5833,21 @@
   element's language.</p>
 
   <p>If no explicit language is given for the <span>root
-  element</span>, then language information from a higher-level
-  protocol (such as HTTP), if any, must be used as the final
-  fallback language. In the absence of any language information, the
-  default value is unknown (the empty string).</p>
+  element</span>, but there is a <span>document-wide default
+  language</span> set, then that is the language of the node.</p>
 
+  <p>If there is no <span>document-wide default language</span>, then
+  language information from a higher-level protocol (such as HTTP), if
+  any, must be used as the final fallback language. In the absence of
+  any language information, the default value is unknown (the empty
+  string).</p>
+
+  <p>If the resulting value is not a recognised language code, then it
+  must be treated as an unknown language (as if the value was the
+  empty string).</p>
+
+  <hr>
+
   <p>User agents may use the element's language to determine proper
   processing or rendering (e.g. in the selection of appropriate
   fonts or pronunciations, or for dictionary selection). <!--User
@@ -8169,6 +8179,7 @@
   those keywords map.<!-- Some of the keywords are non-conforming, as
   noted in the last column.--></p>
 
+<!-- things that are neither conforming nor do anything are commented out -->
   <table>
    <thead>
     <tr>
@@ -8176,12 +8187,10 @@
      <th>Keywords
 <!--     <th>Notes-->
    <tbody>
-<!-- things that are neither conforming nor do anything are commented out
     <tr>
-     <td><span title="attr-meta-http-equiv-content-language">Content-Language</span>
+     <td><span title="attr-meta-http-equiv-content-language">Content Language</span>
      <td><code title="">Content-Language</code>
-     <td>Non-conforming [ XXX but maybe we should make this an alternative to <html lang="">? ]
--->
+<!--     <td>Non-conforming -->
     <tr>
      <td><span title="attr-meta-http-equiv-content-type">Encoding declaration</span>
      <td><code title="">Content-Type</code>
@@ -8254,6 +8263,58 @@
 
   <dl>
 
+   <dt><dfn title="attr-meta-http-equiv-content-language">Content language</dfn>
+
+   <dd>
+
+    <p>This pragma sets the <dfn>document-wide default
+    language</dfn>. Until the pragma is successfully processed, there
+    is no <span>document-wide default language</span>.</p>
+
+    <ol>
+
+     <li><p>If another <code>meta</code> element in the <span
+     title="attr-meta-http-equiv-content-language">Content Language
+     state</span> has already been successfully processed (i.e. when
+     it was inserted the user agent processed it and reached the last
+     step of this list of steps), then abort these steps.</p></li>
+
+     <li><p>If the <code>meta</code> element has no <code
+     title="attr-meta-content">content</code> attribute, or if that
+     attribute's value is the empty string, then abort these
+     steps.</p></li>
+
+     <li><p>Let <var title="">input</var> be the value of the
+     element's <code title="attr-meta-content">content</code>
+     attribute.</p></li>
+
+     <li><p>Let <var title="">position</var> point at the first
+     character of <var title="">input</var>.</p></li>
+
+     <li><p><span>Skip whitespace</span>.</p></li>
+
+     <li><p><span title="collect a sequence of characters">Collect a
+     sequence of characters</span> that are neither <span title="space
+     character">space characters</span> nor a U+002C COMMA character
+     (",").</p></li>
+
+     <li><p>Let the <span>document-wide default language</span> be the
+     string that resulted from the previous step.</p></li>
+
+    </ol>
+
+    <p>For <code>meta</code> elements in the <span
+    title="attr-meta-http-equiv-content-language">Content Language
+    state</span>, the <code title="attr-meta-content">content</code>
+    attribute must have a value consisting of a valid RFC 3066
+    language code. <a href="#refsRFC3066">[RFC3066]</a></p>
+
+    <p class="note">This pragma not exactly equivalent to the HTTP
+    <code>Content-Language</code> header, for instance it only
+    supports one language. <a href="#refsRFC2616">[RFC2616]</a></p>
+
+   </dd>
+
    <dt><dfn title="attr-meta-http-equiv-content-type">Encoding declaration state</dfn>
 
    <dd>