[whatwg] Default encoding to UTF-8?
Leif Halvard Silli
xn--mlform-iua at xn--mlform-iua.no
Tue Dec 6 16:36:26 PST 2011
Jukka K. Korpela Tue Dec 6 13:27:11 PST 2011
> 2011-12-06 22:58, Leif Halvard Silli write:
>
>> There is now a bug, and the editor says the outcome depends on "a
>> browser vendor to ship it":
>> https://www.w3.org/Bugs/Public/show_bug.cgi?id=15076
>>
>> Jukka K. Korpela Tue Dec 6 00:39:45 PST 2011
>>
>>> what is this proposed change to defaults supposed to achieve. […]
>>
>> I'd say the same as in XML: UTF-8 as a reliable, common default.
>
> The "bug" was created so that the argument given was:
> "It would be nice to minimize number of declarations a page needs to
> include."
I just wanted to cite Kornel's original statement. But just because
Kornel cited an authoring use case does not mean that it doesn't have
other use cases. This entire thread started with a user problem. Also,
the entire HTML5 argues in favour of UTF-8, so that seemed not so
important to justify more.
> That is, author convenience - so that authors could work sloppily and
> produce documents that could fail on user agents that haven't
> implemented this change.
There already is locales where UTF-8 is the default, and the fact that
this could benefit some sloppy authors within those locales, is not an
relevant argument against it. In the Western-European locales, one can
make documents that fail on UAs which doesn't operate within our
locales. Thus, either way, some sloppy authors will "benefit" ... But
with the proposed change, then even users *outside* the locales that
share the default encoding of the sloppy author's locale, would benefit.
> This sounds more absurd than I can describe.
>
> XML was created as a new data format; it was an entirely different issue.
HTML5 includes some features that are meant to benefit "jumping" back
and forth between HTML and XML, and this features would and one more
such feature.
>>> If there's something that should be added to or modified in the
>>> algorithm for determining character encoding, the I'd say it's error
>>> processing. I mean user agent behavior when it detects, [...]
>>
>> There is already an (optional) detection step in the algorithm - but UA
>> treat that step differently, it seems.
>
> I'm afraid I can't find it - I mean the treatment of a document for
> which some encoding has been deduced (say, directly from HTTP headers)
> and which then turns out to violate the rules of the encoding.
Sorry, I thought you meant a document where there were no meta data
about the encoding available - (as described in step 7 - 'attempt to
auto-detect' etc).
Leif H Silli
More information about the whatwg
mailing list