[whatwg] Entity parsing

Thu May 22 19:50:23 PDT 2008

On Thu, 28 Jun 2007, Øistein E. Andersen wrote:
> 
> 1) Is it useful to handle unterminated entities followed by an 
> alphanumerical character like IE does? The number of documents for which 
> this actually helps might be small compared to the number of documents 
> that contain other, incorrigible errors. The process also introduces 
> errors, albeit not in conforming documents. Is the gain worth the added 
> complexity?
> 
> If so, then should this apply to all entities? (Probably not.) Would it 
> be useful to add to/remove from the set supported by IE7? (This may seem 
> insane, but we should try to avoid premature decisions.)
> 
> 2) HTML 4.01 allows the semicolon to be omitted in certain cases. Does 
> this cause problems? Firefox and Safari both support this, and it would 
> seem meaningless to change the way conforming documents are parsed 
> unless it can be shown that, e.g., "&ndash " actually is supposed to 
> mean "&ndash " more often than "– ". (Conformance is a 
> separate issue.)
> 
> 3) Will new entities ever be needed? If yes, can new entities adopt 
> existing conformance criteria and parsing rules?
> 
> 4) Similar considerations for entities in attribute values.

New entities have since been added, and the rules for parsing entities 
(sorry, "named character references") have been changed a bit. However, I 
am reluctant to change this from what we have now, since what we have now 
works well. How strongly do you feel about this?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'