[whatwg] Entity definitions in XHTML
Leif Halvard Silli
xn--mlform-iua at xn--mlform-iua.no
Thu Jan 17 19:01:55 PST 2013
David Carlisle on Fri, 18 Jan 2013 00:03:12 +0000:
> To: Ian Hickson <ian at hixie.ch>
> On 17/01/2013 23:31, Ian Hickson wrote:
>> On Thu, 17 Jan 2013, David Carlisle wrote:
>>>>> that documents will be interpreted differently by an XHTML
>>>>> user agent and a standard XML toolchain.
>>>>
>>>> I do not understand what this means. Can you give an example?
Though not XML, the trouble Anolis had with putting out the correct
glyph values for the 〉 and 〈 entities, was caused by a part
of Anolis that interpreted those entities in the old, HTML5
*in*compatible, way. This in turn resulted in the wrong character when
the entities were converted to normal characters before being output to
the HTML5 spec:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=14430
This was a surprisingly long lasting bug. (And perhaps not fully solved
yet …) It had probably existed since HTML5 included named entities in
the spec. And, as the reporter of the bug, I was asked time and again
and again about whether the bug had been fixed or not ...
In this case, Anolis outputted "polyglot" character references, since
it converted the named reference to numeric references. (Please ignore
HTML5's current shortcut:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=20702) But since the bug
actually was in Anolis’ list of named character references, this
nevertheless caused a misrepresentation of the named entities.
>>> There is more to compatibility than compatibility between the
>>> browsers. For XHTML there needs to be compatibility between
>>> Browsers and XML tools (otherwise why use XML at all, I know you
>>> would rather people didn't but so long as the spec allows then to
>>> it should not mandate a situation that makes document corruption so
>>> likely).
>>
>> There is no such mandate. The spec merely provides a catalogue of
>> public identifiers and their modern meaning. Nothing stops XML users
>> from using any other identifier, in particular SYSTEM identifiers.
>> The spec discourages people from using DTDs in general, because of
>> precisely the kinds of issues that are being discussed here, but the
>> XML spec allows it, and that's what controls this at the end of the
>> day (especially in the case of software that isn't using the HTML
>> spec's catalogue).
>>
> As I note above there are many existing systems using the Public
> identifiers of XHTML1 to refer to the XHTML1 DTD and using validating
> parsers. They can not simply switch in a catalog that makes their
> existing document collections invalid. So they can not make documents
> using the XHTML1 public identifier load a DTD other than XHTML1 DTD.
1) If the legacy XHTML DTDs are so risky, shouldn't the spec
explicitly warned against using them in authoring of XHTML5
documents?
2) David, have you considered the possibility of link this named
entity magic to the legacy-compat variant of the HTML5 doctype?
http://www.w3.org/TR/html5/syntax.html#doctype-legacy-string
The advantage of doing so would be that nothing new needs to be
introduced.
The disadvantage (but perhaps advantage in Ian's eyes) ;-)
would be the name of this doctype variant - "legacy".
--
leif halvard silli
More information about the whatwg
mailing list