[whatwg] Internal character encoding declaration

Henri Sivonen hsivonen at iki.fi
Thu Mar 16 03:09:42 PST 2006

On Mar 14, 2006, at 15:07, Peter Karlsson wrote:

> Henri Sivonen on 2006-03-14:

>>> Transcoding is very popular, especially in Russia.
>> In *proxies* *today*? What's the point considering that browsers  
>> have supported the Cyrillic encoding soup *and* UTF-8 for years?
> The mod_charset is not proxying, it's on the server level.

Right. So, as a data point, it neither proves nor disproves the  
legends about transcoding *proxies* around Russia and Japan.

>> How could proxies properly transcode form submissions coming back  
>> without messing everything up spectacularly?
> That's why the "hidden-string" technique was invented. Introduce a  
> hidden <input> with a character string that will get encoded  
> differently depending on the encoding used. When data comes in, use  
> this character string to determine what encoding was used.

I thought that method was for detecting broken browsers and users  
meddling with the encoding menu, and I though using that method was  
relatively rare.

In order for deploying a transcoding proxy to be safe for a Russian  
ISP, virtually every form handler in Russia would have take  
countermeasures against the adverse effects of transcoding proxies.  
Are the countermeasures ubiquitous?

>> Easy parse errors are not fatal in browsers. Surely it is OK for a  
>> conformance checker to complain that much at server operators  
>> whose HTTP layer and meta do not match.
> I just reacted at the notion of calling such documents invalid. It  
> is the transport layer that defines the encoding, whatever the  
> document says or how it looks like is irrelevant, and is just  
> something that you can look at if the transport layer neglects to  
> say anything.

If two layers disagree, it suggests there is a problem and, in my  
opinion, it should be flagged as an error. (Especially considering  
Ruby's Postulate[1].) Operators of transcoding origin servers (or  
reverse proxies which viewed from the Web count as origin servers)  
are free not to send a disagreeing charset meta.

[1] http://intertwingly.net/slides/2004/devcon/69.html

Henri Sivonen
hsivonen at iki.fi

More information about the whatwg mailing list