[html5] r2057 - [] (0) Define the Content-Language pragma, since apparently ~1% of sites use it [...]
whatwg at whatwg.org
whatwg at whatwg.org
Tue Aug 12 03:02:04 PDT 2008
Author: ianh
Date: 2008-08-12 03:02:04 -0700 (Tue, 12 Aug 2008)
New Revision: 2057
Modified:
index
source
Log:
[] (0) Define the Content-Language pragma, since apparently ~1% of sites use it in some way or another.
Modified: index
===================================================================
--- index 2008-08-12 09:32:29 UTC (rev 2056)
+++ index 2008-08-12 10:02:04 UTC (rev 2057)
@@ -7966,15 +7966,15 @@
<!-- technically this is redundant
with the XML spec -->
+ <hr>
+
<p>To determine the language of a node, user agents must look at the
nearest ancestor element (including the element itself if the node is an
element) that has an <code title=attr-xml-lang><a
href="#xmllang">xml:lang</a></code> attribute set or is an <a
href="#html-elements" title="HTML elements">HTML element</a> and has a
<code title=attr-lang><a href="#lang">lang</a></code> attribute set. That
- attribute specifies the language of the node. If that attribute's value is
- not a recognised language code, then it must be treated as an unknown
- language (as if the value was the empty string).
+ attribute specifies the language of the node.
<p>If both the <code title=attr-xml-lang><a
href="#xmllang">xml:lang</a></code> attribute and the <code
@@ -7986,11 +7986,20 @@
the element's language.
<p>If no explicit language is given for the <a href="#root-element">root
- element</a>, then language information from a higher-level protocol (such
+ element</a>, but there is a <a href="#document-wide">document-wide default
+ language</a> set, then that is the language of the node.
+
+ <p>If there is no <a href="#document-wide">document-wide default
+ language</a>, then language information from a higher-level protocol (such
as HTTP), if any, must be used as the final fallback language. In the
absence of any language information, the default value is unknown (the
empty string).
+ <p>If the resulting value is not a recognised language code, then it must
+ be treated as an unknown language (as if the value was the empty string).
+
+ <hr>
+
<p>User agents may use the element's language to determine proper
processing or rendering (e.g. in the selection of appropriate fonts or
pronunciations, or for dictionary selection). <!--User
@@ -8873,7 +8882,7 @@
tokeniser had emitted a start tag token with the tag name "pre", then
set the <a href="#html-0">HTML parser</a>'s <a
href="#tokenization0">tokenization</a> stage's <a
- href="#content3">content model flag</a> to <em>PLAINTEXT</em>.
+ href="#content4">content model flag</a> to <em>PLAINTEXT</em>.
<li>
<p>If <var title="">replace</var> is false, then:
@@ -10200,7 +10209,8 @@
keywords defined for this attribute. The states given in the first cell of
the rows with keywords give the states to which those keywords
map.<!-- Some of the keywords are non-conforming, as
- noted in the last column.-->
+ noted in the last column.--></p>
+ <!-- things that are neither conforming nor do anything are commented out -->
<table>
<thead>
@@ -10209,12 +10219,13 @@
<th>Keywords <!-- <th>Notes-->
- <tbody><!-- things that are neither conforming nor do anything are commented out
+ <tbody>
<tr>
- <td><span title="attr-meta-http-equiv-content-language">Content-Language</span>
+ <td><a href="#content3"
+ title=attr-meta-http-equiv-content-language>Content Language</a>
+
<td><code title="">Content-Language</code>
- <td>Non-conforming [ XXX but maybe we should make this an alternative to <html lang="">? ]
--->
+ <!-- <td>Non-conforming -->
<tr>
<td><a href="#encoding" title=attr-meta-http-equiv-content-type>Encoding
@@ -10291,6 +10302,62 @@
algorithm appropriate for that state, as described in the following list:
<dl>
+ <dt><dfn id=content3 title=attr-meta-http-equiv-content-language>Content
+ language</dfn>
+
+ <dd>
+ <p>This pragma sets the <dfn id=document-wide>document-wide default
+ language</dfn>. Until the pragma is successfully processed, there is no
+ <a href="#document-wide">document-wide default language</a>.</p>
+
+ <ol>
+ <li>
+ <p>If another <code><a href="#meta0">meta</a></code> element in the <a
+ href="#content3" title=attr-meta-http-equiv-content-language>Content
+ Language state</a> has already been successfully processed (i.e. when
+ it was inserted the user agent processed it and reached the last step
+ of this list of steps), then abort these steps.
+
+ <li>
+ <p>If the <code><a href="#meta0">meta</a></code> element has no <code
+ title=attr-meta-content><a href="#content1">content</a></code>
+ attribute, or if that attribute's value is the empty string, then
+ abort these steps.
+
+ <li>
+ <p>Let <var title="">input</var> be the value of the element's <code
+ title=attr-meta-content><a href="#content1">content</a></code>
+ attribute.
+
+ <li>
+ <p>Let <var title="">position</var> point at the first character of
+ <var title="">input</var>.
+
+ <li>
+ <p><a href="#skip-whitespace">Skip whitespace</a>.
+
+ <li>
+ <p><a href="#collect" title="collect a sequence of characters">Collect
+ a sequence of characters</a> that are neither <a href="#space"
+ title="space character">space characters</a> nor a U+002C COMMA
+ character (",").
+
+ <li>
+ <p>Let the <a href="#document-wide">document-wide default language</a>
+ be the string that resulted from the previous step.
+ </ol>
+
+ <p>For <code><a href="#meta0">meta</a></code> elements in the <a
+ href="#content3" title=attr-meta-http-equiv-content-language>Content
+ Language state</a>, the <code title=attr-meta-content><a
+ href="#content1">content</a></code> attribute must have a value
+ consisting of a valid RFC 3066 language code. <a
+ href="#refsRFC3066">[RFC3066]</a></p>
+
+ <p class=note>This pragma not exactly equivalent to the HTTP
+ <code>Content-Language</code> header, for instance it only supports one
+ language. <a href="#refsRFC2616">[RFC2616]</a></p>
+
<dt><dfn id=encoding title=attr-meta-http-equiv-content-type>Encoding
declaration state</dfn>
@@ -36432,7 +36499,7 @@
title="HTML documents">HTML document</a>, create an <a href="#html-0">HTML
parser</a>, associate it with the document, act as if the tokeniser had
emitted a start tag token with the tag name "pre", set the <a
- href="#tokenization0">tokenization</a> stage's <a href="#content3">content
+ href="#tokenization0">tokenization</a> stage's <a href="#content4">content
model flag</a> to <i>PLAINTEXT</i>, and begin to pass the stream of
characters in the plain text document to that tokeniser.
@@ -46624,7 +46691,7 @@
to another state.
<p>The exact behavior of certain states depends on a <dfn
- id=content3>content model flag</dfn> that is set after certain tokens are
+ id=content4>content model flag</dfn> that is set after certain tokens are
emitted. The flag has several states: <i title="">PCDATA</i>, <i
title="">RCDATA</i>, <i title="">CDATA</i>, and <i title="">PLAINTEXT</i>.
Initially it must be in the PCDATA state. In the RCDATA and CDATA states,
@@ -46648,7 +46715,7 @@
<p>When a token is emitted, it must immediately be handled by the <a
href="#tree-construction0">tree construction</a> stage. The tree
- construction stage can affect the state of the <a href="#content3">content
+ construction stage can affect the state of the <a href="#content4">content
model flag</a>, and can insert additional characters into the stream. (For
example, the <code><a href="#script1">script</a></code> element can result
in scripts executing and using the <a href="#dynamic3">dynamic markup
@@ -46659,7 +46726,7 @@
flag">acknowledged</dfn> when it is processed by the tree construction
stage, that is a <a href="#parse2">parse error</a>.
- <p>When an end tag token is emitted, the <a href="#content3">content model
+ <p>When an end tag token is emitted, the <a href="#content4">content model
flag</a> must be switched to the PCDATA state.
<p>When an end tag token is emitted with attributes, that is a <a
@@ -46690,7 +46757,7 @@
<dl class=switch>
<dt>U+0026 AMPERSAND (&)
- <dd>When the <a href="#content3">content model flag</a> is set to one of
+ <dd>When the <a href="#content4">content model flag</a> is set to one of
the PCDATA or RCDATA states and the <a href="#escape">escape flag</a> is
false: switch to the <a href="#character6">character reference data
state</a>.
@@ -46700,7 +46767,7 @@
<dt>U+002D HYPHEN-MINUS (-)
<dd>
- <p>If the <a href="#content3">content model flag</a> is set to either the
+ <p>If the <a href="#content4">content model flag</a> is set to either the
RCDATA state or the CDATA state, and the <a href="#escape">escape
flag</a> is false, and there are at least three characters before this
one in the input stream, and the last four characters in the input
@@ -46713,10 +46780,10 @@
<dt>U+003C LESS-THAN SIGN (<)
- <dd>When the <a href="#content3">content model flag</a> is set to the
+ <dd>When the <a href="#content4">content model flag</a> is set to the
PCDATA state: switch to the <a href="#tag-open0">tag open state</a>.
- <dd>When the <a href="#content3">content model flag</a> is set to either
+ <dd>When the <a href="#content4">content model flag</a> is set to either
the RCDATA state or the CDATA state and the <a href="#escape">escape
flag</a> is false: switch to the <a href="#tag-open0">tag open state</a>.
@@ -46725,7 +46792,7 @@
<dt>U+003E GREATER-THAN SIGN (>)
<dd>
- <p>If the <a href="#content3">content model flag</a> is set to either the
+ <p>If the <a href="#content4">content model flag</a> is set to either the
RCDATA state or the CDATA state, and the <a href="#escape">escape
flag</a> is true, and the last three characters in the input stream
including this one are U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E
@@ -46752,7 +46819,7 @@
<h5 id=character1><span class=secno>8.2.4.2. </span><dfn
id=character6>Character reference data state</dfn></h5>
- <p><em>(This cannot happen if the <a href="#content3">content model
+ <p><em>(This cannot happen if the <a href="#content4">content model
flag</a> is set to the CDATA state.)</em>
<p>Attempt to <a href="#consume">consume a character reference</a>, with no
@@ -46767,11 +46834,11 @@
<h5 id=tag-open><span class=secno>8.2.4.3. </span><dfn id=tag-open0>Tag
open state</dfn></h5>
- <p>The behavior of this state depends on the <a href="#content3">content
+ <p>The behavior of this state depends on the <a href="#content4">content
model flag</a>.
<dl>
- <dt>If the <a href="#content3">content model flag</a> is set to the RCDATA
+ <dt>If the <a href="#content4">content model flag</a> is set to the RCDATA
or CDATA states
<dd>
@@ -46781,7 +46848,7 @@
and reconsume the current input character in the <a
href="#data-state0">data state</a>.</p>
- <dt>If the <a href="#content3">content model flag</a> is set to the PCDATA
+ <dt>If the <a href="#content4">content model flag</a> is set to the PCDATA
state
<dd>
@@ -46834,10 +46901,10 @@
<h5 id=close><span class=secno>8.2.4.4. </span><dfn id=close4>Close tag
open state</dfn></h5>
- <p>If the <a href="#content3">content model flag</a> is set to the RCDATA
+ <p>If the <a href="#content4">content model flag</a> is set to the RCDATA
or CDATA states but no start tag token has ever been emitted by this
instance of the tokeniser (<a href="#fragment">fragment case</a>), or, if
- the <a href="#content3">content model flag</a> is set to the RCDATA or
+ the <a href="#content4">content model flag</a> is set to the RCDATA or
CDATA states and the next few characters do not match the tag name of the
last start tag token emitted (compared in an <span>ASCII case
insensitive</span> manner), or if they do but they are not immediately
@@ -46864,7 +46931,7 @@
character token, and switch to the <a href="#data-state0">data state</a>
to process the <a href="#next-input">next input character</a>.
- <p>Otherwise, if the <a href="#content3">content model flag</a> is set to
+ <p>Otherwise, if the <a href="#content4">content model flag</a> is set to
the PCDATA state, or if the next few characters <em>do</em> match that tag
name, consume the <a href="#next-input">next input character</a>:
@@ -47346,7 +47413,7 @@
<h5 id=bogus><span class=secno>8.2.4.16. </span><dfn id=bogus1>Bogus
comment state</dfn></h5>
- <p><em>(This can only happen if the <a href="#content3">content model
+ <p><em>(This can only happen if the <a href="#content4">content model
flag</a> is set to the PCDATA state.)</em>
<p>Consume every character up to and including the first U+003E
@@ -47365,7 +47432,7 @@
<h5 id=markup><span class=secno>8.2.4.17. </span><dfn id=markup0>Markup
declaration open state</dfn></h5>
- <p><em>(This can only happen if the <a href="#content3">content model
+ <p><em>(This can only happen if the <a href="#content4">content model
flag</a> is set to the PCDATA state.)</em>
<p>If the next two characters are both U+002D HYPHEN-MINUS (-) characters,
@@ -47385,7 +47452,7 @@
(the five uppercase letters "CDATA" with a U+005B LEFT SQUARE BRACKET
character before and after), then consume those characters and switch to
the <a href="#cdata2">CDATA section state</a> (which is unrelated to the
- <a href="#content3">content model flag</a>'s CDATA state).
+ <a href="#content4">content model flag</a>'s CDATA state).
<p>Otherwise, this is a <a href="#parse2">parse error</a>. Switch to the <a
href="#bogus1">bogus comment state</a>. The next character that is
@@ -47995,9 +48062,9 @@
<h5 id=cdata0><span class=secno>8.2.4.36. </span><dfn id=cdata2>CDATA
section state</dfn></h5>
- <p><em>(This can only happen if the <a href="#content3">content model
+ <p><em>(This can only happen if the <a href="#content4">content model
flag</a> is set to the PCDATA state, and is unrelated to the <a
- href="#content3">content model flag</a>'s CDATA state.)</em>
+ href="#content4">content model flag</a>'s CDATA state.)</em>
<p>Consume every character up to the next occurrence of the three character
sequence U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE BRACKET U+003E
@@ -48710,10 +48777,10 @@
<li>
<p>If the algorithm that was invoked is the <a href="#generic">generic
CDATA element parsing algorithm</a>, switch the tokeniser's <a
- href="#content3">content model flag</a> to the CDATA state; otherwise
+ href="#content4">content model flag</a> to the CDATA state; otherwise
the algorithm invoked was the <a href="#generic0">generic RCDATA element
parsing algorithm</a>, switch the tokeniser's <a
- href="#content3">content model flag</a> to the RCDATA state.
+ href="#content4">content model flag</a> to the RCDATA state.
<li>
<p>Then, collect all the character tokens that the tokeniser returns
@@ -48726,7 +48793,7 @@
all those tokens' characters, to the new element node.
<li>
- <p>The tokeniser's <a href="#content3">content model flag</a> will have
+ <p>The tokeniser's <a href="#content4">content model flag</a> will have
switched back to the PCDATA state.
<li>
@@ -49358,7 +49425,7 @@
script will execute in-line, instead of blowing the document away, as
would happen in most other cases.</p>
- <p>Switch the tokeniser's <a href="#content3">content model flag</a> to
+ <p>Switch the tokeniser's <a href="#content4">content model flag</a> to
the CDATA state.</p>
<p>Then, collect all the character tokens that the tokeniser returns
@@ -49370,7 +49437,7 @@
href="#script1">script</a></code> element node whose contents is the
concatenation of all those tokens' characters.</p>
- <p>The tokeniser's <a href="#content3">content model flag</a> will have
+ <p>The tokeniser's <a href="#content4">content model flag</a> will have
switched back to the PCDATA state.</p>
<p>If the next token is not an end tag token with the tag name "script",
@@ -49941,13 +50008,13 @@
<p><a href="#insert0">Insert an HTML element</a> for the token.</p>
- <p>Switch the <a href="#content3">content model flag</a> to the PLAINTEXT
+ <p>Switch the <a href="#content4">content model flag</a> to the PLAINTEXT
state.</p>
<p class=note>Once a start tag with the tag name "plaintext" has been
seen, that will be the last token ever seen other than character tokens
(and the end-of-file token), because there is no way to switch the <a
- href="#content3">content model flag</a> out of the PLAINTEXT state.</p>
+ href="#content4">content model flag</a> out of the PLAINTEXT state.</p>
</dd>
<!-- end tags for non-phrasing flow content elements -->
<!-- the normal ones -->
@@ -50576,7 +50643,7 @@
<code>form</code> element pointed to by the <a
href="#form-element"><code title="">form</code> element pointer</a>.</p>
- <p>Switch the tokeniser's <a href="#content3">content model flag</a> to
+ <p>Switch the tokeniser's <a href="#content4">content model flag</a> to
the RCDATA state.</p>
<p>If the next token is a U+000A LINE FEED (LF) character token, then
@@ -50591,7 +50658,7 @@
single <code>Text</code> node, whose contents is the concatenation of
all those tokens' characters, to the new element node.</p>
- <p>The tokeniser's <a href="#content3">content model flag</a> will have
+ <p>The tokeniser's <a href="#content4">content model flag</a> will have
switched back to the PCDATA state.</p>
<p>If the next token is an end tag token with the tag name "textarea",
@@ -52504,14 +52571,14 @@
<li>
<p>Set the <a href="#html-0">HTML parser</a>'s <a
href="#tokenization0">tokenization</a> stage's <a
- href="#content3">content model flag</a> according to the <var
+ href="#content4">content model flag</a> according to the <var
title="">context</var> element, as follows:</p>
<dl class=switch>
<dt>If it is a <code><a href="#title1">title</a></code> or
<code>textarea</code> element
- <dd>Set the <a href="#content3">content model flag</a> to the RCDATA
+ <dd>Set the <a href="#content4">content model flag</a> to the RCDATA
state.
<dt>If it is a <code><a href="#style1">style</a></code>, <code><a
@@ -52519,23 +52586,23 @@
href="#iframe">iframe</a></code>, <code>noembed</code>, or
<code>noframes</code> element
- <dd>Set the <a href="#content3">content model flag</a> to the CDATA
+ <dd>Set the <a href="#content4">content model flag</a> to the CDATA
state.
<dt>If it is a <code><a href="#noscript">noscript</a></code> element
<dd>If the <a href="#scripting3">scripting flag</a> is enabled, set the
- <a href="#content3">content model flag</a> to the CDATA state.
- Otherwise, set the <a href="#content3">content model flag</a> to the
+ <a href="#content4">content model flag</a> to the CDATA state.
+ Otherwise, set the <a href="#content4">content model flag</a> to the
PCDATA state.
<dt>If it is a <code>plaintext</code> element
- <dd>Set the <a href="#content3">content model flag</a> to PLAINTEXT.
+ <dd>Set the <a href="#content4">content model flag</a> to PLAINTEXT.
<dt>Otherwise
- <dd>Set the <a href="#content3">content model flag</a> to the PCDATA
+ <dd>Set the <a href="#content4">content model flag</a> to the PCDATA
state.
</dl>
Modified: source
===================================================================
--- source 2008-08-12 09:32:29 UTC (rev 2056)
+++ source 2008-08-12 10:02:04 UTC (rev 2057)
@@ -5814,15 +5814,15 @@
<span>HTML documents</span>.</p> <!-- technically this is redundant
with the XML spec -->
+ <hr>
+
<p>To determine the language of a node, user agents must look at the
nearest ancestor element (including the element itself if the node
is an element) that has an <code
title="attr-xml-lang">xml:lang</code> attribute set or is an <span
title="HTML elements">HTML element</span> and has a <code
title="attr-lang">lang</code> attribute set. That attribute
- specifies the language of the node. If that attribute's value is not
- a recognised language code, then it must be treated as an unknown
- language (as if the value was the empty string).</p>
+ specifies the language of the node.</p>
<p>If both the <code title="attr-xml-lang">xml:lang</code> attribute
and the <code title="attr-lang">lang</code> attribute are set on an
@@ -5833,11 +5833,21 @@
element's language.</p>
<p>If no explicit language is given for the <span>root
- element</span>, then language information from a higher-level
- protocol (such as HTTP), if any, must be used as the final
- fallback language. In the absence of any language information, the
- default value is unknown (the empty string).</p>
+ element</span>, but there is a <span>document-wide default
+ language</span> set, then that is the language of the node.</p>
+ <p>If there is no <span>document-wide default language</span>, then
+ language information from a higher-level protocol (such as HTTP), if
+ any, must be used as the final fallback language. In the absence of
+ any language information, the default value is unknown (the empty
+ string).</p>
+
+ <p>If the resulting value is not a recognised language code, then it
+ must be treated as an unknown language (as if the value was the
+ empty string).</p>
+
+ <hr>
+
<p>User agents may use the element's language to determine proper
processing or rendering (e.g. in the selection of appropriate
fonts or pronunciations, or for dictionary selection). <!--User
@@ -8169,6 +8179,7 @@
those keywords map.<!-- Some of the keywords are non-conforming, as
noted in the last column.--></p>
+<!-- things that are neither conforming nor do anything are commented out -->
<table>
<thead>
<tr>
@@ -8176,12 +8187,10 @@
<th>Keywords
<!-- <th>Notes-->
<tbody>
-<!-- things that are neither conforming nor do anything are commented out
<tr>
- <td><span title="attr-meta-http-equiv-content-language">Content-Language</span>
+ <td><span title="attr-meta-http-equiv-content-language">Content Language</span>
<td><code title="">Content-Language</code>
- <td>Non-conforming [ XXX but maybe we should make this an alternative to <html lang="">? ]
--->
+<!-- <td>Non-conforming -->
<tr>
<td><span title="attr-meta-http-equiv-content-type">Encoding declaration</span>
<td><code title="">Content-Type</code>
@@ -8254,6 +8263,58 @@
<dl>
+ <dt><dfn title="attr-meta-http-equiv-content-language">Content language</dfn>
+
+ <dd>
+
+ <p>This pragma sets the <dfn>document-wide default
+ language</dfn>. Until the pragma is successfully processed, there
+ is no <span>document-wide default language</span>.</p>
+
+ <ol>
+
+ <li><p>If another <code>meta</code> element in the <span
+ title="attr-meta-http-equiv-content-language">Content Language
+ state</span> has already been successfully processed (i.e. when
+ it was inserted the user agent processed it and reached the last
+ step of this list of steps), then abort these steps.</p></li>
+
+ <li><p>If the <code>meta</code> element has no <code
+ title="attr-meta-content">content</code> attribute, or if that
+ attribute's value is the empty string, then abort these
+ steps.</p></li>
+
+ <li><p>Let <var title="">input</var> be the value of the
+ element's <code title="attr-meta-content">content</code>
+ attribute.</p></li>
+
+ <li><p>Let <var title="">position</var> point at the first
+ character of <var title="">input</var>.</p></li>
+
+ <li><p><span>Skip whitespace</span>.</p></li>
+
+ <li><p><span title="collect a sequence of characters">Collect a
+ sequence of characters</span> that are neither <span title="space
+ character">space characters</span> nor a U+002C COMMA character
+ (",").</p></li>
+
+ <li><p>Let the <span>document-wide default language</span> be the
+ string that resulted from the previous step.</p></li>
+
+ </ol>
+
+ <p>For <code>meta</code> elements in the <span
+ title="attr-meta-http-equiv-content-language">Content Language
+ state</span>, the <code title="attr-meta-content">content</code>
+ attribute must have a value consisting of a valid RFC 3066
+ language code. <a href="#refsRFC3066">[RFC3066]</a></p>
+
+ <p class="note">This pragma not exactly equivalent to the HTTP
+ <code>Content-Language</code> header, for instance it only
+ supports one language. <a href="#refsRFC2616">[RFC2616]</a></p>
+
+ </dd>
+
<dt><dfn title="attr-meta-http-equiv-content-type">Encoding declaration state</dfn>
<dd>
More information about the Commit-Watchers
mailing list