[whatwg] [WA1] Specifying Character Encoding

Lachlan Hunt lachlan.hunt at lachy.id.au
Sat Apr 9 01:29:26 PDT 2005


Anne van Kesteren wrote:
> Lachlan Hunt wrote:
> 
>> | In XHTML, the XML declaration should be used for inline character
>> | encoding information.
>> |
>> | Authors should avoid including inline character encoding information.
>> | Character encoding information should instead be included at the
>> | transport level (e.g. using the HTTP Content-Type header).
>>
>> The second paragraph should only apply to HTML using the meta element, 
>> not XHTML using the XML declaration.
> 
> Why? If people are still using text/xml for example you really want them 
> to use the HTTP Content-Type header. Otherwise its US-ASCII.

I didn't consider text/xml because the current draft states in the 
conformance requirements.

| XML documents [...] that are served over the wire (e.g. by HTTP) must
| be sent using an XML MIME type such as application/xml or
| application/xhtml+xml...

I had initially interpreted that as meaning authors must use 
application/*+xml and must not use text/xml; however, that 
interpretation may be incorrect.  Perhaps it should be explicitly stated 
that text/xml should not be used, with a reference to the webarch 
recommendation.

In any case, my statement about the second paragraph still stands for 
XML served as application/*+xml, though it should probably apply to XML 
served as text/xml too.  It is unclear whether or not a document served 
as text/xml;charset=whatever, should include the XML encoding 
declaration or not, but probably not because: "Transcoding may make the 
self-description false..." (as described in webarch).

>> I think it should also be noted that authors who omit the XML 
>> declaration (or include it but don't specify the encoding attribute) 
>> *must* use UTF-8 or UTF-16, as described in the XML recommendation.
> 
> Where did you read that in the XML specification?

Appendix F.1. states [1]:

| Because each XML entity not accompanied by external encoding
| information and not in UTF-8 or UTF-16 encoding must begin with an XML
| encoding declaration

> You can always specify encoding using the 'charset' parameter.

...although I had forgotten it was acceptable to use an encoding other 
than UTF-8 or UTF-16 without the xml declaration when "accompanied by 
external encoding information", as well as being somewhat misinformed by 
the statement in XHTML 1.0 Appendix C [2]:

| Remember, however, that when the XML declaration is not included in a
| document, the document can only use the default character encodings
| UTF-8 or UTF-16.

Which fails to mention the condition of extenal encoding information.

[1] http://www.w3.org/TR/REC-xml/#sec-guessing
[2] http://www.w3.org/TR/xhtml1/#C_1
-- 
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/     Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox




More information about the whatwg mailing list