[whatwg] Entity parsing

Ian Hickson ian at hixie.ch
Thu Jun 21 21:08:49 PDT 2007


On Thu, 14 Jun 2007, Michel Fortin wrote:
> Le 2007-06-14 à 21:05, Ian Hickson a écrit :
> 
> > I've defined the parsing and conformance requirements in a way that 
> > matches IE. As a side-effect, this has made things like "na&iumlve" 
> > actually conforming. I don't know if we want this.
> 
> I'd make it non-conforming for the sake of readability.

Done.


On Fri, 15 Jun 2007, Simon Pieters wrote:
>
> Firefox, Opera and Safari treat "na&iumlve" as equivalent to 
> "na&iumlve". So for compat with them, the semicolon should be made 
> required.

Agreed.


On Fri, 15 Jun 2007, Køi¹tof ®elechovski wrote:
>
> Aside: I know that it can be changed but "iuml" is a very unfortunate 
> name for "i tréma".  How about deprecating "iuml" in favor of "itrema"?

We're not deprecating anything, and just introducing a new name for i-uml 
would be a dangerous slippery slope to start down. Anyway, i-umlaut is 
fine, and easier to spell than i-diaeresis; why would you call "itrema"? 
Trema doesn't seem any more common than "umlaut"...


On Fri, 15 Jun 2007, Kornel Lesinski wrote:
> > 
> > I've defined the parsing and conformance requirements in a way that 
> > matches IE. As a side-effect, this has made things like "na&iumlve" 
> > actually conforming. I don't know if we want this.
> 
> Rather not. This would break unencoded URLs:
> 
> ?foo=bar&region=baz → ?foo=bar®ion=baz

On Fri, 15 Jun 2007, Anne van Kesteren wrote:
> 
> You mean that Internet Explorer breaks them already? That doesn't make 
> much sense to me.

On Fri, 15 Jun 2007, Kornel Lesinski wrote:
> 
> No, IE doesn't break them, and that's the point.
> 
> Section 8.2.3.1. states "This definition is used when parsing entities 
> in text and in attributes." - if I understand this correctly, this makes 
> semicolon optional for entities in both attributes and text and 
> "&region" in attribute would be interpreted as "®ion".
>
> If that's the case, it is not compatible with IE, because it parses 
> entities differently in attributes and text. Semicolon (or any 
> non-alphanumeric character actually) is required in attributes, but in 
> text it is not.
> 
> In IE6 <a href="&region">&region</a> is equivalent to <a 
> href="&region">®ion</a>

On Sat, 16 Jun 2007, Anne van Kesteren wrote:
> 
> Awesome. Guess we have to reverse engineer that too then...

On Mon, 18 Jun 2007, Simon Pieters wrote:
> 
> Entity parsing works the same in different attributes (tested <img alt> and <a
> href>).
> 
> Any character that is not in the range [a-zA-Z0-9] ends an entity -- i.e., the
> following are equivalent:
> 
>   <img alt="&AElig.">
>   <img alt="Æ.">
> 
> ...and the following are equivalent:
> 
>   <img alt="&AElig1">
>   <img alt="&AElig1">

Fixed. Sigh.


> This means that the semi-colon is not part of the entity name, and we 
> need to revert to the old entity table and instead have a third column 
> that says which entities always require a semi-colon.

Actually no, some of the entities, even in an attribute, require a 
semicolon. Compare, for instance, these:

   <span title="&DaggerA">  <span title="&degA">
   <span title="&Dagger@">  <span title="&deg@">
   <span title="‡">  <span title="°">
                &DaggerA                 &degA
                &Dagger@                 &deg@
                ‡                 °

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the whatwg mailing list