[whatwg] External document subset support
brettz9 at yahoo.com
Sun May 24 22:35:57 PDT 2009
Henri Sivonen wrote:
> On May 18, 2009, at 11:50, Brett Zamir wrote:
>> Henri Sivonen wrote:
>>> On May 18, 2009, at 09:36, Brett Zamir wrote:
>> Also, as far as heavy server loads for frequent DTDs, entities could
>> be deliberately not defined at a resolvable URL.
> There are existing XML doctypes out there with resolvable URIs, so
> you'd need a blacklist to bootstrap such a solution.
As you suggest on your site, 'If, for legacy reasons, you must process
some well-known DTDs, please make your entity resolver retrieve those
DTDs from a local catalog." I would think the big browsers would be
fully capable of doing this (as XML allows for by distinguishing public
and system identifiers), and for any which exploded in popularity before
obtaining a public identifier, I would imagine a blacklist could work.
>> The same problems of denial-of-service could exist with stylesheet
>> requests, script requests, etc.
> No, styles and scripts are commonly site-specific, so there isn't a
> Web-wide single point of failure whose URI gets copied around as
Well, again, as mentioned below, they can be of wider use, but I see
your point that the effects on other sites would indeed most likely be
stronger if the source site went down. While I think that's a risk they
should be free to take (just as if people want to share or rely on
external scripts), but if there's enough feeling against that, the issue
could be addressed by requiring browsers to only access same domain.
>> Even some sites, like Yahoo, have encouraged referring to their
>> frequently accessed external files to take advantage of caching.
> At least the serving infrastructure for those URIs has been designed
> for high load unlike the server for many existing DTD URIs out there.
Again, I say either let them take the risk if they actually make a
likely popular DTD to be available, allow a blacklist, or if need really
be, limit to the same domain.
> Furthermore, JS libraries have obvious functionality in existing
> browsers, so it's unlikely that authors would reference JS libraries
> as part of boilerplate without actually intending to take the perf hit
> of loading the library.
Presumably most XML users will be including doctypes which include a
public identifier. Use of lesser known XML dialects will probably
presume some knowledge of what is happening, and even then, the official
provider of the dialect, will probably know not to provide their DTD
directly as a referenceable DTD.
>> The spec could even insist on same-domain, though I don't see any
>> need for that.
> Without same-origin (as in not even performing a CORS GET), you'd need
> to blacklist at least w3.org due to existing references out there.
Sounds fine, though I am assuming w3.org references already have a
PUBLIC identifier for their DTDs.
> (Note that for security, same-origin/CORS is must-have anyway.)
A must-have if you don't trust the origin, yes. But plenty of sites
include scripts from other sites for ads or analysis. It would not be
such a big loss in the case of DTDs to restrict to same domain, however.
>> I also disagree with throwing our hands up in the air about character
>> entities (or thinking that the (English-based) HTML ones are
> That's a text input method issue that needs to be solved on the
> authoring side for text input of all kind--not just text input for
> writing XML in a text editor.
So, what's wrong with doing it in XML? If you're saying that text
editors need to better support Unicode, then sure, but that's not a
complete solution, given the cumbersomeness of finding obscure
characters, etc. which can more simply be defined once in a DTD and
forgotten. It's a nice feature for a text format which can be created
across a variety of editors.
>> Moreover, the browser with the largest market share offers such
>> support already, and those who depend on it may already view other
>> browsers not supporting the standard as "broken".
> IE doesn't support XHTML or SVG which are the popular XML formats one
> might want to load into a browsing context.
Again, if there is an offline use, there is a browsing use. Just because
not everyone is rushing to use XML in this way, does not mean that a lot
of people would not like to share especially their document-centric XML
in such a fashion (and even data-centric XML).
Yes, a Firefox/Opera/Safari user who tries XHTML in IE will find it
"broken", while a user of Firefox, etc. visiting an XML file dependent
on an external DTD will find it broken. Firefox/Opera/Safari should be
free to offer this positive feature to their users, even if IE doesn't
come on board (to their eventual detriment I would think), while I would
hope Firefox et al would implement this one feature on top of their
already existing support for showing XML as a tree. As I said, IE is
offering functionality which other browser users will think is broken in
their browser--I think that is due to these browsers not having gone far
enough, rather than IE having gone too far; just because the spec
technically makes it optional, doesn't mean entity resolution for at
least same-domain system-only-identified DTD's shouldn't become the de
facto standard given the features it offers.
>>> Loading same-origin DTDs for the purpose of localization is a
>>> semi-defensible case, but it's a lot of complexity for a use case
>>> that is way on the wrong side of 80/20 on the Web scale.
>> How so?
> Localized sites are a minority on the Web, and chances that localized
> Web apps would switch to a client-side localization method that relies
> on server-side negotiation of the localization and requires XML to
> work seem dim.
Maybe, but it is also very easy to use. I would hope browsers (and the
specs guiding their collective behavior) could consider the convenience
for document authors. Firefox developers, for example, are well familiar
with them and some are eager to use them for remote XUL. I've seen an
increasing number of .xhtml extension documents already out in the wild,
despite a lack of support in IE, and despite such a change (without
customizable external DTD's) offering arguably less benefits to the
document creator than easy localization (though XHTML could also benefit
from such DTD localization as well).
>> Even if it is a niche group which uses TEI, Docbook, etc. or who
>> wants to be able to build say a browser extension which can take
>> advantage of their rich semantics, this is still a use for citizens
>> of the web.
> If you need a browser extension for content, you shut out users of
> browsers that don't have the particular extension available. It's like
> using Flash.
While I agree that having to use an extension would limit the usefulness
(that's why I'm so passionate about seeing browsers implement it), I'm
talking about extensions that build interesting optional interfaces to
that content--for example to perform an XQuery on the content (I've made
a Firefox extension which does this) or to give a simple interface
allowing users to highlight content or search only within special
semantic tags (e.g., <date/>, <said/>, <bibl/>, etc. tags in TEI). But I
very much agree that browsers should all implement the basic
infrastructure: 1) XML tree for non-formatted XML, 2) CSS rendering of
pure XML, 3) External DTD support, 4) Recognition of dialects like XHTML
within larger XML fragments, and they're already almost there.
Beyond this being about open technologies, it is also about being able
to innovate. Even Flash can be supplanted over time by open standards,
not to mention specialized languages with a much smaller audience. but
using TEI isn't really going to break anything as long as you can at
least load and view the document. Yes, there is a concern of
babelization of semantics, but that is only a concern for document
authors, and again I don't think XHTML can or should fill all semantic
>> If people can push forward with backwards-incompatible technologies
>> like the video element, 3d-animation, or whatever, it seems not much
>> to ask to support the humble external entity file... :)
> The upside of video and 3D is much more significant than the upside of
> supporting external DTDs.
So animation is more important than Shakespeare? A lot of classical
literature is richly encoded in XML languages like TEI. No doubt the
readers of Shakespeare are fewer than those of video and 3d, but I don't
think that means they are less important, especially when the
implementation must, I would imagine, be quite a bit easier as well.
>>> Besides, if the use case for DTDs is localization within an origin,
>>> the server can perform the XML parse and reserialize into DTDless
>>> XML. (That's how I've implemented this pattern in the past without
>>> client-side support.)
>> That is assuming people are aware of scripting and have access to
>> such resources.
> Localization with DTDs but without scripting is already tricky, since
> one would need to tweak conneg.
support cases of dynamic localization if DOM methods like
document.createEntityReference() were implemented along with external
>> Wasn't it one of the aims of the likes of XSL, XQuery, and XForms to
>> use a syntax which doesn't require knowledge of an unrelated
>> scripting language (and those are pretty complex examples unlike
> Web browsers don't support XSL-FO, XQuery or XForms.
I for one hope they will. There seems to be a fair amount of interest in
XForms at the very least. But my point is that it seems to be a W3C goal
(and a good one) to make technologies which avoid a need for specialized
scripting knowledge or services.
> (XSLT support isn't something that can be generalized to feature
> triage policy applicable to new features today.)
Sorry, I don't follow.
>> (Btw, you and I discussed this before, though I didn't get a response
>> from you to my last post:
>> https://bugzilla.mozilla.org/show_bug.cgi?id=22942#c109 ; I don't
>> mean to go off-topic but you might wish to consider or respond to
>> some of its points as well...)
> Oh. I didn't make the connection. I didn't reply there, because using
> Bugzilla as a discussion forum--particularly when the discussion turns
> to advocacy--is frowned upon.
I thought we were addressing rationales related to the legitimacy of
implementing the bug, but all right.
> Are there some particular points that I haven't addressed here that
> you'd like to re-raise?
I think we're mostly rehashing it anyways. :)
More information about the whatwg