[whatwg] [WA1] Specifying Character Encoding
Ian Hickson
ian at hixie.ch
Wed Feb 28 17:58:31 PST 2007
On Sat, 9 Apr 2005, Lachlan Hunt wrote:
>
> In the current draft, for specifying the character encoding [1], it is
> stated:
>
> | In XHTML, the XML declaration should be used for inline character
> | encoding information.
> |
> | Authors should avoid including inline character encoding information.
> | Character encoding information should instead be included at the
> | transport level (e.g. using the HTTP Content-Type header).
>
> The second paragraph should only apply to HTML using the meta element,
> not XHTML using the XML declaration.
I don't understand why it would be ok for one and not the other.
> For X(HT)ML, according to the Architecture of the World Wide Web, Volume
> One - Media types for XML [2]:
> [2] http://www.w3.org/TR/2004/REC-webarch-20041215/#xml-media-types
>
> | In general, a representation provider SHOULD NOT specify the character
> | encoding for XML data in protocol headers since the data is
> | self-describing.
I personally disagree with the arguments above (transcoding proxies mean
that the content really can't know what its content is, and therefore it
shouldn't be saying what its encoding is). I could see an argument for
removing the advice from the HTML5 spec altogether, though. What do you
think?
> I think it should also be noted that authors who omit the XML
> declaration (or include it but don't specify the encoding attribute)
> *must* use UTF-8 or UTF-16, as described in the XML recommendation.
If you specify the HTTP headers, you could use anything, even, say,
GSM03.38 or UTF-EBCDIC.
On Sat, 9 Apr 2005, Anne van Kesteren wrote:
>
> Why? If people are still using text/xml for example you really want them
> to use the HTTP Content-Type header. Otherwise its US-ASCII.
Right.
> > I think it should also be noted that authors who omit the XML
> > declaration (or include it but don't specify the encoding attribute)
> > *must* use UTF-8 or UTF-16, as described in the XML recommendation.
>
> Where did you read that in the XML specification? You can always specify
> encoding using the 'charset' parameter. That it is not recommended
> because "webarch" things documents should be self-describing doesn't
> matter. Also note that when the document is served using text/xml they
> could use UTF-8 but it wouldn't work.
Exactly.
On Sat, 9 Apr 2005, Lachlan Hunt wrote:
>
> I didn't consider text/xml because the current draft states in the
> conformance requirements.
>
> | XML documents [...] that are served over the wire (e.g. by HTTP) must
> | be sent using an XML MIME type such as application/xml or
> | application/xhtml+xml...
>
> I had initially interpreted that as meaning authors must use
> application/*+xml and must not use text/xml; however, that
> interpretation may be incorrect. Perhaps it should be explicitly stated
> that text/xml should not be used, with a reference to the webarch
> recommendation.
I never did understand why people don't like text/*. It's nice and short
and all these types are text, so...
I've made no changes to the spec, but let me know if you think something
should change.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list