[whatwg] Distinguishing XML and HTML by content sniffing

Bjoern Hoehrmann derhoermi at gmx.net
Sat Mar 3 22:58:44 PST 2007


* Michael Day wrote:
>For user agents like Prince that support XML and HTML content it is 
>sometimes necessary to distinguish whether a .html file is actually XML 
>or HTML in order for it to be processed correctly.
>
>I've written an article for XML.com explaining exactly how Prince 
>performs content sniffing to distinguish XML and HTML documents:
>
>     What Does XML Smell Like?
>     http://www.xml.com/pub/a/2007/02/28/what-does-xml-smell-like.html
>
>Any feedback would be greatly appreciated.

Well, the article would be more interesting if you had explained why you
took this particular approach instead of, say, parsing the first 8K with
an XML parser and if that succeeds it's XML and HTML otherwise, and what
the implementation would consider your article.
-- 
Björn Höhrmann · mailto:bjoern at hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



More information about the whatwg mailing list