[whatwg] Ampersands not followed by ASCII letters or #

Sat Sep 1 10:18:52 PDT 2007

On Tue, 19 Jun 2007 02:55:20 +0200, Ian Hickson <ian at hixie.ch> wrote:

> On Wed, 27 Dec 2006, Henri Sivonen wrote:
>>
>> I noticed that the Web Apps spec itself contains script samples with
>> unescaped JavaScript && operators in <pre> blocks.
>>
>> Considering that this is not an error in HTML 4.01 as SGML and
>> considering that it is harmless in browsers, I think the top-level
>> "Anything else" case under "8.2.3.1. Tokenising entities" should be
>> split in two so that there is also an error-free case for the ASCII
>> characters that aren't '#', aren't ASCII letters and that weren't in
>> error in SGML-based HTML. I don't have The Handbook at my disposal right
>> now, but the error-free case should cover at least '&', '<' and space
>> characters.
>
> I've allowed:
>
>    U+0009 CHARACTER TABULATION
>    U+000A LINE FEED (LF)
>    U+000B LINE TABULATION
>    U+000C FORM FEED (FF)
>    U+0020 SPACE
>    U+003C LESS-THAN SIGN
>    U+0026 AMPERSAND
>    EOF
>
> Let me know if you want any more added to the list.

I'm not really fond of this change. It complicates things and makes HTML  
harder to teach. It might also slip through authoring mistakes. I can  
imagine that this is something that many authors would refer to as "sloppy  
coding".

Moreover, if we are to do this then the < character should get the same  
treatment, and we might want to allow ' and " too (e.g. the spec uses "<'"  
in some places), which complicates things even further; the Writing HTML  
documents section needs to handle a lot more cases, including e.g. the  
case when the character is the last character of an attribute value...  
actually thinking about it this is already the case for the unquoted  
attribute syntax.

I'd rather this change was reverted.

-- 
Simon Pieters