[whatwg] External document subset support

Henri Sivonen hsivonen at iki.fi
Mon May 18 03:56:55 PDT 2009

On May 18, 2009, at 11:50, Brett Zamir wrote:

> Henri Sivonen wrote:
>> On May 18, 2009, at 09:36, Brett Zamir wrote:
>>> Section 10.1, "Writing XHTML documents" observes: "According to  
>>> the XML specification, XML processors are not guaranteed to  
>>> process the external DTD subset referenced in the DOCTYPE."
>>> While this is true, since no doubt the majority of web browsers  
>>> are already able to process external stylesheets or scripts, might  
>>> the very useful feature of external entity files, be employed by  
>>> XHTML 5 as a stricter subset of XML (similar to how XML Namespaces  
>>> re-annexed the colon character) in order to allow this useful  
>>> feature to work for XHTML (to have access to HTML entities or  
>>> other useful entities for one, as well as enable a poor man's  
>>> localization, etc.)?
>> See http://hsivonen.iki.fi/no-dtd/ explains why DTDs don't work for  
>> the Web in the general case.
> While that is a thoughtful and helpful article, your arguments there  
> mostly relate to validation from a central spec.

No, my arguments don't relate to validation but to having to  
dereference a URI that isn't under the author's control and that gets  
copied around as boilerplate.

> Also, as far as heavy server loads for frequent DTDs, entities could  
> be deliberately not defined at a resolvable URL.

There are existing XML doctypes out there with resolvable URIs, so  
you'd need a blacklist to bootstrap such a solution.

> The same problems of denial-of-service could exist with stylesheet  
> requests, script requests, etc.

No, styles and scripts are commonly site-specific, so there isn't a  
Web-wide single point of failure whose URI gets copied around as  

> Even some sites, like Yahoo, have encouraged referring to their  
> frequently accessed external files to take advantage of caching.

At least the serving infrastructure for those URIs has been designed  
for high load unlike the server for many existing DTD URIs out there.  
Furthermore, JS libraries have obvious functionality in existing  
browsers, so it's unlikely that authors would reference JS libraries  
as part of boilerplate without actually intending to take the perf hit  
of loading the library.

> The spec could even insist on same-domain, though I don't see any  
> need for that.

Without same-origin (as in not even performing a CORS GET), you'd need  
to blacklist at least w3.org due to existing references out there.  
(Note that for security, same-origin/CORS is must-have anyway.)

> I also disagree with throwing our hands up in the air about  
> character entities (or thinking that the (English-based) HTML ones  
> are sufficient).

That's a text input method issue that needs to be solved on the  
authoring side for text input of all kind--not just text input for  
writing XML in a text editor.

> Moreover, the browser with the largest market share offers such  
> support already, and those who depend on it may already view other  
> browsers not supporting the standard as "broken".

IE doesn't support XHTML or SVG which are the popular XML formats one  
might want to load into a browsing context.

>> Loading same-origin DTDs for the purpose of localization is a semi- 
>> defensible case, but it's a lot of complexity for a use case that  
>> is way on the wrong side of 80/20 on the Web scale.
> How so?

Localized sites are a minority on the Web, and chances that localized  
Web apps would switch to a client-side localization method that relies  
on server-side negotiation of the localization and requires XML to  
work seem dim.

> Even if it is a niche group which uses TEI, Docbook, etc. or who  
> wants to be able to build say a browser extension which can take  
> advantage of their rich semantics, this is still a use for citizens  
> of the web.

If you need a browser extension for content, you shut out users of  
browsers that don't have the particular extension available. It's like  
using Flash.

> If people can push forward with backwards-incompatible technologies  
> like the video element, 3d-animation, or whatever, it seems not much  
> to ask to support the humble external entity file... :)

The upside of video and 3D is much more significant than the upside of  
supporting external DTDs.

>> Besides, if the use case for DTDs is localization within an origin,  
>> the server can perform the XML parse and reserialize into DTDless  
>> XML. (That's how I've implemented this pattern in the past without  
>> client-side support.)
> That is assuming people are aware of scripting and have access to  
> such resources.

Localization with DTDs but without scripting is already tricky, since  
one would need to tweak conneg. Furthermore, localization with DTDs  
makes more sense for Web app UIs than static content, and Web apps  
typically have server-side program code anyway.

> Wasn't it one of the aims of the likes of XSL, XQuery, and XForms to  
> use a syntax which doesn't require knowledge of an unrelated  
> scripting language (and those are pretty complex examples unlike  
> entities)?

Web browsers don't support XSL-FO, XQuery or XForms. (XSLT support  
isn't something that can be generalized to feature triage policy  
applicable to new features today.)

> (Btw, you and I discussed this before, though I didn't get a  
> response from you to my last post: https://bugzilla.mozilla.org/show_bug.cgi?id=22942 
> #c109 ; I don't mean to go off-topic but you might wish to consider  
> or respond to some of its points as well...)

Oh. I didn't make the connection. I didn't reply there, because using  
Bugzilla as a discussion forum--particularly when the discussion turns  
to advocacy--is frowned upon. Are there some particular points that I  
haven't addressed here that you'd like to re-raise?

Henri Sivonen
hsivonen at iki.fi

More information about the whatwg mailing list