[whatwg] External document subset support

Brett Zamir brettz9 at yahoo.com
Mon May 18 01:50:44 PDT 2009


Henri Sivonen wrote:
> On May 18, 2009, at 09:36, Brett Zamir wrote:
>
>> Section 10.1, "Writing XHTML documents" observes: "According to the 
>> XML specification, XML processors are not guaranteed to process the 
>> external DTD subset referenced in the DOCTYPE."
>>
>> While this is true, since no doubt the majority of web browsers are 
>> already able to process external stylesheets or scripts, might the 
>> very useful feature of external entity files, be employed by XHTML 5 
>> as a stricter subset of XML (similar to how XML Namespaces re-annexed 
>> the colon character) in order to allow this useful feature to work 
>> for XHTML (to have access to HTML entities or other useful entities 
>> for one, as well as enable a poor man's localization, etc.)?
>
> See http://hsivonen.iki.fi/no-dtd/ explains why DTDs don't work for 
> the Web in the general case.
>
While that is a thoughtful and helpful article, your arguments there 
mostly relate to validation from a central spec. Also, as far as heavy 
server loads for frequent DTDs, entities could be deliberately not 
defined at a resolvable URL. The same problems of denial-of-service 
could exist with stylesheet requests, script requests, etc. Even some 
sites, like Yahoo, have encouraged referring to their frequently 
accessed external files to take advantage of caching. The spec could 
even insist on same-domain, though I don't see any need for that. If I 
give my website out to Slashdot, I shouldn't be surprised when I get 
"slashdotted", and if I do, that's my fault, not the web's fault. A DTD 
doesn't need to reference a central location, nor would it be likely 
that major browsers would fail to use the PUBLIC identifier to avoid 
checking for the SYSTEM file.

I also disagree with throwing our hands up in the air about character 
entities (or thinking that the (English-based) HTML ones are 
sufficient). As I said, just because the original spec defined it as 
optional, does not mean we must perpetually remain stuck in the past, 
especially in the case of XML-on-the-web which is not going to break a 
whole lot of browsing uses at all if external DTDs are suddently made 
possible. Moreover, the browser with the largest market share offers 
such support already, and those who depend on it may already view other 
browsers not supporting the standard as "broken".
> Loading same-origin DTDs for the purpose of localization is a 
> semi-defensible case, but it's a lot of complexity for a use case that 
> is way on the wrong side of 80/20 on the Web scale. 
How so? And besides localization, there are many other uses such as 
providing a convenient tool for editors to avoid finding a copyright 
symbol, etc. Not everyone uses an IDE which makes these available or 
knows how to use it. I'm assisting such a project which has this issue. 
And I really don't buy the web/non-web dichotomy which some people make. 
If there's an offline use, there's an online use, pure and simple. And a 
client-side-only use as well--to be able to read my own documents, I'd 
like to do so in a browser--many others besides me like to "live in" 
their browsers.

Even if it is a niche group which uses TEI, Docbook, etc. or who wants 
to be able to build say a browser extension which can take advantage of 
their rich semantics, this is still a use for citizens of the web. If 
people can push forward with backwards-incompatible technologies like 
the video element, 3d-animation, or whatever, it seems not much to ask 
to support the humble external entity file... :)
> Besides, if the use case for DTDs is localization within an origin, 
> the server can perform the XML parse and reserialize into DTDless XML. 
> (That's how I've implemented this pattern in the past without 
> client-side support.)
>
That is assuming people are aware of scripting and have access to such 
resources. Wasn't it one of the aims of the likes of XSL, XQuery, and 
XForms to use a syntax which doesn't require knowledge of an unrelated 
scripting language (and those are pretty complex examples unlike entities)?

(Btw, you and I discussed this before, though I didn't get a response 
from you to my last post: 
https://bugzilla.mozilla.org/show_bug.cgi?id=22942#c109 ; I don't mean 
to go off-topic but you might wish to consider or respond to some of its 
points as well...)

best wishes,
Brett


More information about the whatwg mailing list