[whatwg] Entity parsing
zcorpan at gmail.com
Mon Jun 18 03:47:57 PDT 2007
On Sat, 16 Jun 2007 15:30:07 +0200, Anne van Kesteren <annevk at opera.com>
>> No, IE doesn't break them, and that's the point.
>> Section 126.96.36.199. states "This definition is used when parsing entities
>> in text and in attributes." - if I understand this correctly, this
>> makes semicolon optional for entities in both attributes and text and
>> "®ion" in attribute would be interpreted as "®ion".
>> If that's the case, it is not compatible with IE, because it parses
>> entities differently in attributes and text. In attributes semicolon
>> (any non-alphanumeric character actually) is required, but in text it
>> is not.
>> In IE6 <a href="®ion">®ion</a> is equivalent to <a
> Awesome. Guess we have to reverse engineer that too then...
The tests aren't really digestable in their current state unless you know
what they're doing, but well, I'll just say what the results are below. I
might create proper test cases on this later when this is specced.
Entity parsing works the same in different attributes (tested <img alt>
and <a href>).
Any character that is not in the range [a-zA-Z0-9] ends an entity -- i.e.,
the following are equivalent:
...and the following are equivalent:
This means that the semi-colon is not part of the entity name, and we need
to revert to the old entity table and instead have a third column that
says which entities always require a semi-colon.
You consume as many characters as possible that match the entity table,
and for the longest match, check if the next character is in the
abovementioned range. If yes, emit the consumed characters, otherwise emit
the entity, or something along those lines.
More information about the whatwg