[whatwg] Entity definitions in XHTML

Thu Jan 17 19:01:55 PST 2013

David Carlisle on Fri, 18 Jan 2013 00:03:12 +0000:
> To: Ian Hickson <ian at hixie.ch>
> On 17/01/2013 23:31, Ian Hickson wrote:
>> On Thu, 17 Jan 2013, David Carlisle wrote:

>>>>> that documents will be interpreted differently by an XHTML
>>>>> user agent and a standard XML toolchain.
>>>> 
>>>> I do not understand what this means. Can you give an example?

Though not XML, the trouble Anolis had with putting out the correct 
glyph values for the ⟩ and ⟨ entities, was caused by a part 
of Anolis that interpreted those entities in the old, HTML5 
*in*compatible, way. This in turn resulted in the wrong character when 
the entities were converted to normal characters before being output to 
the HTML5 spec:
    https://www.w3.org/Bugs/Public/show_bug.cgi?id=14430
This was a surprisingly long lasting bug. (And perhaps not fully solved 
yet …) It had probably existed since HTML5 included named entities in 
the spec. And, as the reporter of the bug, I was asked time and again 
and again about whether the bug had been fixed or not ...

In this case, Anolis outputted "polyglot" character references, since 
it converted the named reference to numeric references. (Please ignore 
HTML5's current shortcut: 
https://www.w3.org/Bugs/Public/show_bug.cgi?id=20702) But since the bug 
actually was in Anolis’ list of named character references, this 
nevertheless caused a misrepresentation of the named entities.

>>> There is more to compatibility than compatibility between the
>>> browsers. For XHTML there needs to be compatibility between
>>> Browsers and XML tools (otherwise why use XML at all, I know you
>>> would rather people didn't but so long as the spec allows then to
>>> it should not mandate a situation that makes document corruption so
>>> likely).
>> 
>> There is no such mandate. The spec merely provides a catalogue of
>> public identifiers and their modern meaning. Nothing stops XML users
>>  from using any other identifier, in particular SYSTEM identifiers.
>> The spec discourages people from using DTDs in general, because of
>> precisely the kinds of issues that are being discussed here, but the
>>  XML spec allows it, and that's what controls this at the end of the
>>  day (especially in the case of software that isn't using the HTML
>> spec's catalogue).
>> 
> As I note above there are many existing systems using the Public
> identifiers of XHTML1 to refer to the XHTML1 DTD and using validating
> parsers. They can not simply switch in a catalog that makes their
> existing document collections invalid. So they can not make documents
> using the XHTML1 public identifier load a DTD other than XHTML1 DTD.

1) If the legacy XHTML DTDs are so risky, shouldn't the spec
   explicitly warned against using them in authoring of XHTML5
   documents?

2) David, have you considered the possibility of link this named
   entity magic to the legacy-compat variant of the HTML5 doctype?

   http://www.w3.org/TR/html5/syntax.html#doctype-legacy-string

   The advantage of doing so would be that nothing new needs to be
   introduced.
   The disadvantage (but perhaps advantage in Ian's eyes) ;-)
   would be the name of this doctype variant - "legacy".
-- 
leif halvard silli