[whatwg] 9.1.2.1: trailing slash and atheism
Christian Schmidt
whatwg.org at chsc.dk
Sat Dec 2 18:17:39 PST 2006
Charles Iliya Krempeaux wrote:
> Sometimes web developers parse (non-XML) HTML with an XML parser
> because it's the tool they have on hand.
>
> Consider a PHP developer trying to analyse an HTML page.
>
> If a PHP developer wants to analyse an HTML page; that developer may
> try to use SimpleXML <http://php.net/simplexml> because that's what
> they have on hand and know how to use. There's no SimpleHTML
> available in PHP.
>
> And while none of this is certainly our fault. This is a situation
> some web developers are going to run into. (What else are they going
> to use?)
PHP developers can parse HTML using DOMDocument::loadHTML(). If they
want, they can then convert the DOMDoucment to SimpleXML:
$doc = new DOMDocument();
$doc->loadHTML('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"><title>Foo</title>
<body>Foo<br>bar');
$simpleXml = simplexml_import_dom($doc);
print $simpleXml->head->title;
Christian
More information about the whatwg
mailing list