[whatwg] Ampersands not followed by ASCII letters or #
zcorpan at gmail.com
Sat Sep 1 10:18:52 PDT 2007
On Tue, 19 Jun 2007 02:55:20 +0200, Ian Hickson <ian at hixie.ch> wrote:
> On Wed, 27 Dec 2006, Henri Sivonen wrote:
>> I noticed that the Web Apps spec itself contains script samples with
>> Considering that this is not an error in HTML 4.01 as SGML and
>> considering that it is harmless in browsers, I think the top-level
>> "Anything else" case under "18.104.22.168. Tokenising entities" should be
>> split in two so that there is also an error-free case for the ASCII
>> characters that aren't '#', aren't ASCII letters and that weren't in
>> error in SGML-based HTML. I don't have The Handbook at my disposal right
>> now, but the error-free case should cover at least '&', '<' and space
> I've allowed:
> U+0009 CHARACTER TABULATION
> U+000A LINE FEED (LF)
> U+000B LINE TABULATION
> U+000C FORM FEED (FF)
> U+0020 SPACE
> U+003C LESS-THAN SIGN
> U+0026 AMPERSAND
> Let me know if you want any more added to the list.
I'm not really fond of this change. It complicates things and makes HTML
harder to teach. It might also slip through authoring mistakes. I can
imagine that this is something that many authors would refer to as "sloppy
Moreover, if we are to do this then the < character should get the same
treatment, and we might want to allow ' and " too (e.g. the spec uses "<'"
in some places), which complicates things even further; the Writing HTML
documents section needs to handle a lot more cases, including e.g. the
case when the character is the last character of an attribute value...
actually thinking about it this is already the case for the unquoted
I'd rather this change was reverted.
More information about the whatwg