[whatwg] [WA1] Specifying Character Encoding
Lachlan Hunt
lachlan.hunt at lachy.id.au
Sat Apr 9 01:29:26 PDT 2005
Anne van Kesteren wrote:
> Lachlan Hunt wrote:
>
>> | In XHTML, the XML declaration should be used for inline character
>> | encoding information.
>> |
>> | Authors should avoid including inline character encoding information.
>> | Character encoding information should instead be included at the
>> | transport level (e.g. using the HTTP Content-Type header).
>>
>> The second paragraph should only apply to HTML using the meta element,
>> not XHTML using the XML declaration.
>
> Why? If people are still using text/xml for example you really want them
> to use the HTTP Content-Type header. Otherwise its US-ASCII.
I didn't consider text/xml because the current draft states in the
conformance requirements.
| XML documents [...] that are served over the wire (e.g. by HTTP) must
| be sent using an XML MIME type such as application/xml or
| application/xhtml+xml...
I had initially interpreted that as meaning authors must use
application/*+xml and must not use text/xml; however, that
interpretation may be incorrect. Perhaps it should be explicitly stated
that text/xml should not be used, with a reference to the webarch
recommendation.
In any case, my statement about the second paragraph still stands for
XML served as application/*+xml, though it should probably apply to XML
served as text/xml too. It is unclear whether or not a document served
as text/xml;charset=whatever, should include the XML encoding
declaration or not, but probably not because: "Transcoding may make the
self-description false..." (as described in webarch).
>> I think it should also be noted that authors who omit the XML
>> declaration (or include it but don't specify the encoding attribute)
>> *must* use UTF-8 or UTF-16, as described in the XML recommendation.
>
> Where did you read that in the XML specification?
Appendix F.1. states [1]:
| Because each XML entity not accompanied by external encoding
| information and not in UTF-8 or UTF-16 encoding must begin with an XML
| encoding declaration
> You can always specify encoding using the 'charset' parameter.
...although I had forgotten it was acceptable to use an encoding other
than UTF-8 or UTF-16 without the xml declaration when "accompanied by
external encoding information", as well as being somewhat misinformed by
the statement in XHTML 1.0 Appendix C [2]:
| Remember, however, that when the XML declaration is not included in a
| document, the document can only use the default character encodings
| UTF-8 or UTF-16.
Which fails to mention the condition of extenal encoding information.
[1] http://www.w3.org/TR/REC-xml/#sec-guessing
[2] http://www.w3.org/TR/xhtml1/#C_1
--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
More information about the whatwg
mailing list