[whatwg] Null characters

Boris Zbarsky bzbarsky at MIT.EDU
Tue Oct 9 07:21:40 PDT 2012

On 10/9/12 12:09 AM, Cameron Zemek wrote:
> How is it not web-compatible?

Because shipping it "breaks" sites.  As in, makes them render 
differently than they do in current browsers, sufficiently so that it's 
a problem.

> Yeah I don't have any numbers to see if this is the case or not.

As Anne said, we tried shipping this and got user feedback indicating 
that sufficiently many sites are broken that it was not acceptable to us.

> But just thinking about it logically what issues would there be in showing Null character as
> the replacement character instead? Visually would see some extra
> characters if the document author had Null characters. What is the big
> deal with doing that?

It makes text unreadable.  Consider text that's actually UTF-16 but 
being declared as ISO-8859-1.  If you strip the nulls, it all works out. 
  But if you don't, every other character is a replacement character.

This is not a rare situation on the web, unfortunately.

> Why do authors even have null characters in
> their HTML documents?

Because they have UTF-16 text in their database that they dump into an 
ISO-8859-1 document.  They have no idea there are any "null characters" 

> I assume I'm probably missing some historical reason for this

Yes, that reason is "the browsers all do it this way, so web sites 
depend on it".


More information about the whatwg mailing list