[whatwg] External document subset support
brettz9 at yahoo.com
Mon May 18 01:50:44 PDT 2009
Henri Sivonen wrote:
> On May 18, 2009, at 09:36, Brett Zamir wrote:
>> Section 10.1, "Writing XHTML documents" observes: "According to the
>> XML specification, XML processors are not guaranteed to process the
>> external DTD subset referenced in the DOCTYPE."
>> While this is true, since no doubt the majority of web browsers are
>> already able to process external stylesheets or scripts, might the
>> very useful feature of external entity files, be employed by XHTML 5
>> as a stricter subset of XML (similar to how XML Namespaces re-annexed
>> the colon character) in order to allow this useful feature to work
>> for XHTML (to have access to HTML entities or other useful entities
>> for one, as well as enable a poor man's localization, etc.)?
> See http://hsivonen.iki.fi/no-dtd/ explains why DTDs don't work for
> the Web in the general case.
While that is a thoughtful and helpful article, your arguments there
mostly relate to validation from a central spec. Also, as far as heavy
server loads for frequent DTDs, entities could be deliberately not
defined at a resolvable URL. The same problems of denial-of-service
could exist with stylesheet requests, script requests, etc. Even some
sites, like Yahoo, have encouraged referring to their frequently
accessed external files to take advantage of caching. The spec could
even insist on same-domain, though I don't see any need for that. If I
give my website out to Slashdot, I shouldn't be surprised when I get
"slashdotted", and if I do, that's my fault, not the web's fault. A DTD
doesn't need to reference a central location, nor would it be likely
that major browsers would fail to use the PUBLIC identifier to avoid
checking for the SYSTEM file.
I also disagree with throwing our hands up in the air about character
entities (or thinking that the (English-based) HTML ones are
sufficient). As I said, just because the original spec defined it as
optional, does not mean we must perpetually remain stuck in the past,
especially in the case of XML-on-the-web which is not going to break a
whole lot of browsing uses at all if external DTDs are suddently made
possible. Moreover, the browser with the largest market share offers
such support already, and those who depend on it may already view other
browsers not supporting the standard as "broken".
> Loading same-origin DTDs for the purpose of localization is a
> semi-defensible case, but it's a lot of complexity for a use case that
> is way on the wrong side of 80/20 on the Web scale.
How so? And besides localization, there are many other uses such as
providing a convenient tool for editors to avoid finding a copyright
symbol, etc. Not everyone uses an IDE which makes these available or
knows how to use it. I'm assisting such a project which has this issue.
And I really don't buy the web/non-web dichotomy which some people make.
If there's an offline use, there's an online use, pure and simple. And a
client-side-only use as well--to be able to read my own documents, I'd
like to do so in a browser--many others besides me like to "live in"
Even if it is a niche group which uses TEI, Docbook, etc. or who wants
to be able to build say a browser extension which can take advantage of
their rich semantics, this is still a use for citizens of the web. If
people can push forward with backwards-incompatible technologies like
the video element, 3d-animation, or whatever, it seems not much to ask
to support the humble external entity file... :)
> Besides, if the use case for DTDs is localization within an origin,
> the server can perform the XML parse and reserialize into DTDless XML.
> (That's how I've implemented this pattern in the past without
> client-side support.)
That is assuming people are aware of scripting and have access to such
resources. Wasn't it one of the aims of the likes of XSL, XQuery, and
XForms to use a syntax which doesn't require knowledge of an unrelated
scripting language (and those are pretty complex examples unlike entities)?
(Btw, you and I discussed this before, though I didn't get a response
from you to my last post:
https://bugzilla.mozilla.org/show_bug.cgi?id=22942#c109 ; I don't mean
to go off-topic but you might wish to consider or respond to some of its
points as well...)
More information about the whatwg