[whatwg] HTML 5 and PHP

Ian Hickson ian at hixie.ch
Thu Feb 15 14:37:01 PST 2007

On Thu, 15 Feb 2007, Keryx Web wrote:
> 1. PHP has a useful nl2br-function that takes a string and inserts a 
> <br> tag before every newline. http://se.php.net/nl2br
> If HTML 5 in its HTML serialization actually forbids the self closing 
> slash in the <br> element it will be impossible to use this function for 
> anything but the XML serialization.

In HTML5, the string "<br/>" is valid. So the PHP function works fine. :-)

> 2. Speaking of XML, as of PHP 5 there is a plethora of XML tools 
> available for manipulation of content: A really good DOM implementation 
> (with many convenience shortcuts i miss when scripting JS), Simple XML, 
> XSLT, XML Reader, SAX, XML Writer, etc. Server side it makes very much 
> sense to use the XML serialization and not the HTML one.

Now that the HTML5 specification has a very clear HTML parser 
specification, it would be relatively simple for someone to write an HTML5 
parser in PHP which can then be used with the XML pipeline. This has 
already been done with Python, for instance:


The above project also provides a number of test cases:


...that can be a huge help to any parser implementation project.

> As the spec stands today, I think the discouragement from using "XML on 
> the web" is way to strongly worded. Client support may be faltering, but 
> on the server side XML technologies are very mature and very useful.

While it is true that tools are widely available, they aren't widely used, 
For example, WordPress doesn't support XML natively -- it actually outputs 
strings, it doesn't have an internal tree representation. This can easily 
lead to non-XML-well-formed content.

> I would suggest rephrasing:
>| Generally speaking, authors are discouraged from trying to use XML on
>| the Web, because XML has much stricter syntax rules than the "HTML5"
>| variant described above, and is relatively newer and therefore less 
>| mature.
> To something like:
>| Authors must be aware that XML has much stricter syntax rules than the 
>| "HTML5" variant described above and that true XML parser will choke on 
>| even the slightest error.

Unfortunately, saying that authors "must" be aware of something is not 
often heeded. Authors usually claim to be aware of many things that they 
aren't aware of, or don't really understand. This is why the spec is 
currently phrased as just a discouragement. (The word "must" also has much 
stronger implications than the word "discourage" in spec terminology.)


> I have a few questions on how HTML 5 might not play nice with PHP. 
> Considering that maybe 90 % of all content on the web is dynamic and 
> that PHP have perhaps 50% of that, this one is a biggy.

These numbers don't match the research I have done -- could you elaborate 
on the methodology you used to obtain these numbers? Based on my own 
research I would say that 45% is far over-estimating the number of pages 
on the Web that are generated using PHP.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list