[whatwg] Entity definitions in XHTML

Ian Hickson ian at hixie.ch
Wed Jul 31 11:22:48 PDT 2013

On Fri, 18 Jan 2013, David Carlisle wrote:
> On 17/01/2013 23:31, Ian Hickson wrote:
> > On Thu, 17 Jan 2013, David Carlisle wrote:
> > > 
> > > http://www.w3.org/2003/entities/2007doc/xhtmlpubid.html
> > > 
> > > But basically it solves the problem that the existing list leads to 
> > > a situation where data corruption and user confusion are both 
> > > inevitable as the only way to enable entities to be loaded into a an 
> > > xhtml agent is to reference a DTD that defines a different 
> > > incompatible set of entities.
> > 
> > This seems to be predicated on the assumption that the proposed new 
> > identifier would identify a different DTD than the existing 
> > identifiers.
> The proposed identifier _by definition_ identifies the list that is in 
> the HTML spec. Not surprising since you extract the list from the same 
> place.
> > This is false. They would all identify the same DTD.
> No, they don't. That is the trouble.

I think we're disagreeing about different things.

If I understand correctly, you're saying that DTD A, in legacy UAs (XML 
processors implementing the DTDs as defined by XHTML 1.x specs and 
company), maps to different characters than you are proposing DTD B should 
map to in new UAs.

I'm saying that in new UAs, those that implement the HTML spec, both A and 
B will map to the same set, because we've only got one set in the HTML spec.

> Only the proposed one identifies that list. The others are all 
> pre-existing identifiers that identify incompatible sets. It is fine in 
> a browser context that you over-ride that and load the HTML5 set in all 
> cases but while you may control the browser you can't control existing 
> workflows that already use these identifiers for the purposes for which 
> they were defined, to identify the XHTML and MathML2 DTD.
> Browsers do not validate so can effectively use an implicit catalog that 
> switches in the data URL with the HTML entities but since that contains 
> no element definitions it would completely break any XML tools that rely 
> on validation.

The way the HTML spec is written, it does not override the XML spec. All 
it does is provide a catalogue that maps the following identifiers:

   -//W3C//DTD XHTML 1.0 Transitional//EN
   -//W3C//DTD XHTML 1.1//EN
   -//W3C//DTD XHTML 1.0 Strict//EN
   -//W3C//DTD XHTML 1.0 Frameset//EN
   -//W3C//DTD XHTML Basic 1.0//EN
   -//W3C//DTD XHTML 1.1 plus MathML 2.0//EN
   -//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN
   -//W3C//DTD MathML 2.0//EN
   -//WAPFORUM//DTD XHTML Mobile 1.0//EN

Nothing stops a UA from supporting other DTDs, in particular system 
identifiers are still supporting as per the XML spec.

> If you specify a DTD that defines the HTML entity set, no entities are 
> defined. If you specify a DTD which does not define them, they are all 
> defined. This is so obviously sub-optimal I honestly can't understand 
> how the bug can remain open for years after having been reported.

The list is designed to make legacy XML documents keep working. If you 
think we should remove specific entries from this list, I'm happy to do so 
(but it'll cause pages to not work in browsers, since that's the only list 
they use). I see no value to adding to this list, since the list is purely 
for legacy purposes.

> > The list in the spec was based on what browsers implemented.
> No. It is a subset of what mozilla did but bears no relation to what IE 
> did for example.

If it's a subset of what a browser did, then it was clearly "based on what 
browsers implemented".

> As I note above there are many existing systems using the Public 
> identifiers of XHTML1 to refer to the XHTML1 DTD and using validating 
> parsers. They can not simply switch in a catalog that makes their 
> existing document collections invalid. So they can not make documents 
> using the XHTML1 public identifier load a DTD other than XHTML1 DTD.

They don't need to implement the HTML spec either, right?

I really don't understand the problem here.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list