[whatwg] Entity parsing
Ian Hickson
ian at hixie.ch
Thu Jun 21 21:08:49 PDT 2007
On Thu, 14 Jun 2007, Michel Fortin wrote:
> Le 2007-06-14 à 21:05, Ian Hickson a écrit :
>
> > I've defined the parsing and conformance requirements in a way that
> > matches IE. As a side-effect, this has made things like "naïve"
> > actually conforming. I don't know if we want this.
>
> I'd make it non-conforming for the sake of readability.
Done.
On Fri, 15 Jun 2007, Simon Pieters wrote:
>
> Firefox, Opera and Safari treat "naïve" as equivalent to
> "naïve". So for compat with them, the semicolon should be made
> required.
Agreed.
On Fri, 15 Jun 2007, Køi¹tof ®elechovski wrote:
>
> Aside: I know that it can be changed but "iuml" is a very unfortunate
> name for "i tréma". How about deprecating "iuml" in favor of "itrema"?
We're not deprecating anything, and just introducing a new name for i-uml
would be a dangerous slippery slope to start down. Anyway, i-umlaut is
fine, and easier to spell than i-diaeresis; why would you call "itrema"?
Trema doesn't seem any more common than "umlaut"...
On Fri, 15 Jun 2007, Kornel Lesinski wrote:
> >
> > I've defined the parsing and conformance requirements in a way that
> > matches IE. As a side-effect, this has made things like "naïve"
> > actually conforming. I don't know if we want this.
>
> Rather not. This would break unencoded URLs:
>
> ?foo=bar®ion=baz â ?foo=bar®ion=baz
On Fri, 15 Jun 2007, Anne van Kesteren wrote:
>
> You mean that Internet Explorer breaks them already? That doesn't make
> much sense to me.
On Fri, 15 Jun 2007, Kornel Lesinski wrote:
>
> No, IE doesn't break them, and that's the point.
>
> Section 8.2.3.1. states "This definition is used when parsing entities
> in text and in attributes." - if I understand this correctly, this makes
> semicolon optional for entities in both attributes and text and
> "®ion" in attribute would be interpreted as "®ion".
>
> If that's the case, it is not compatible with IE, because it parses
> entities differently in attributes and text. Semicolon (or any
> non-alphanumeric character actually) is required in attributes, but in
> text it is not.
>
> In IE6 <a href="®ion">®ion</a> is equivalent to <a
> href="®ion">®ion</a>
On Sat, 16 Jun 2007, Anne van Kesteren wrote:
>
> Awesome. Guess we have to reverse engineer that too then...
On Mon, 18 Jun 2007, Simon Pieters wrote:
>
> Entity parsing works the same in different attributes (tested <img alt> and <a
> href>).
>
> Any character that is not in the range [a-zA-Z0-9] ends an entity -- i.e., the
> following are equivalent:
>
> <img alt="Æ.">
> <img alt="Æ.">
>
> ...and the following are equivalent:
>
> <img alt="Æ1">
> <img alt="Æ1">
Fixed. Sigh.
> This means that the semi-colon is not part of the entity name, and we
> need to revert to the old entity table and instead have a third column
> that says which entities always require a semi-colon.
Actually no, some of the entities, even in an attribute, require a
semicolon. Compare, for instance, these:
<span title="&DaggerA"> <span title="°A">
<span title="&Dagger@"> <span title="°@">
<span title="‡"> <span title="°">
&DaggerA °A
&Dagger@ °@
‡ °
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list