[whatwg] HTML 5 and PHP

Keryx Web webmaster at keryx.se
Sat Feb 17 09:31:27 PST 2007

Ian Hickson wrote:
> In HTML5, the string "<br/>" is valid. So the PHP function works fine. :-)

OK, I've re-read the discussion from November. My memory was incorrect. 
_Singleton_ tags are allowed to be "self-closing", it seems...

> Now that the HTML5 specification has a very clear HTML parser 
> specification, it would be relatively simple for someone to write an HTML5 
> parser in PHP which can then be used with the XML pipeline. This has 
> already been done with Python, for instance:
>    http://code.google.com/p/html5lib/
> The above project also provides a number of test cases:
>    http://html5lib.googlecode.com/svn/trunk/tests/
> ...that can be a huge help to any parser implementation project.

Although I am quite sure we are going to see some activity concerning 
(X)HTML 5 on PECL soon, the option of using native XML methods still 
appeal very much to me.

>> As the spec stands today, I think the discouragement from using "XML on 
>> the web" is way to strongly worded. Client support may be faltering, but 
>> on the server side XML technologies are very mature and very useful.
> While it is true that tools are widely available, they aren't widely used, 
> For example, WordPress doesn't support XML natively -- it actually outputs 
> strings, it doesn't have an internal tree representation. This can easily 
> lead to non-XML-well-formed content.

Perhaps going off topic here:

Wordpress is written in PHP 4. It can not benefit from the best Tidylib, 
the real DOM-extension, XML Reader, XML Writer or Simple XML. All those 
extensions require PHP 5. Nor can Wordpress do professional interaction 
with the DBMS through mysqli or PDO, since those also require PHP 5. 
Wordpress may be spectacularly successful, but IMHO it's design is a 
really crappy one. It was at the right time at the right place, but it 
is not a good example of how to do an enterprise PHP application.

There are tons and tons of legacy applications still on the web, written 
in PHP 4, not to mention bad hosts that are afraid to upgrade, but 
according to stats from Zend most new development is now done in PHP 5, 
espoecially if one considers enterprise level apps.

Joomla is a better example of an PHP app that slowly is going in the 
right direction.

>> I would suggest rephrasing ---
>> To something like:
>> | Authors must be aware that XML has much stricter syntax rules than the 
>> | "HTML5" variant described above and that true XML parser will choke on 
>> | even the slightest error.
> Unfortunately, saying that authors "must" be aware of something is not 
> often heeded. Authors usually claim to be aware of many things that they 
> aren't aware of, or don't really understand. This is why the spec is 
> currently phrased as just a discouragement. (The word "must" also has much 
> stronger implications than the word "discourage" in spec terminology.)

Since I am (among other things) a PHP developer I know all there is to 
know about sloppy coding... ;-)

OTOH I do not believe it's a good idea to patronize developers either. 
And why would they heed the current "discouragement"? There are, once 
again IMHO, two major reasons why new sites still have ugly, not valid, 
un-semantic tag soup code: WYSIWYG tools and CMS-software.

I'd suggest that most developers of such software do not consider XML to 
be an immature technology. They are doing XUL, XAMLL, SOAP, RSS, OOXML, 
ODF, etc, all day long...

Lars Gunther

More off topic:

> Incidentally:
>> I have a few questions on how HTML 5 might not play nice with PHP. 
>> Considering that maybe 90 % of all content on the web is dynamic and 
>> that PHP have perhaps 50% of that, this one is a biggy.
> These numbers don't match the research I have done -- could you elaborate 
> on the methodology you used to obtain these numbers? Based on my own 
> research I would say that 45% is far over-estimating the number of pages 
> on the Web that are generated using PHP.

There are lies, damned lies and statistics. This is one of the 
impossible to know exactly questions. One might consider:

Netcraft stats: How many websites report PHP in their server string?
That would be more than 50 %.

But then, on shared hosts, do people actually use it, and if they do, to 
what extent? So Netcraft stats are too high.

One might consider file extensions. Well, that might work well for ASP, 
but PHP-developers just love mod_rewrite way too much for this to be 
reliable. So stats based on file extensions are too low.

Where did I get my numbers?

I went the middle road, and the exaggerated some for effect ;-)

These facts however are __undeniable__

1. PHP is the most used server side scripting language in entry- and 
mid-range apps. Perhaps Ruby will take some of that market though.

2. PHP is rapidly gaining acceptance for enterprise level apps, with 
extensive backing from Yahoo, IBM, et al.

3. PHP is the most used scripting language for the well known "Web 2.0" 
sites, even ahead of Ruby, and way, way ahead of every other language. 
Even in Sweden, which is a country more faithful to all things MS than 
most, PHP is gaining ground.

4. I happen to like PHP immensely. Every time I do JS or some other 
language, I feel soooo crippled. I want my shortcuts, my flexibility, 
and my "x-thousands, impossible to remember them all - functions". :-)

More information about the whatwg mailing list