[whatwg] Provding Better Tools
Michel Fortin
michel.fortin at michelf.com
Sun Dec 3 17:10:34 PST 2006
Le 3 déc. 2006 à 17:04, J. King a écrit :
> I am. It's not anywhere near finished yet, but the parser so far
> goes through the whole document and spits out the appropriate
> tokens; I just haven't done anything with said tokens yet, mainly
> because I was discouraged by PHP's DOM implementation.
> My parser is also slow as molasses, unfortunately.
My experience optimizing PHP Markdown, and building the custom mixed
Markdown/HTML-block pesudo-tokenizer of PHP Markdown Extra, tells me
that it'll probably stay very slow as long as the implementation is
made of PHP code.
Assuming you've implemented the algorithm in the spec as PHP code,
you could probably make it faster by using regular expressions in the
tokenization steps instead of iterating character by character. For
instance, you could implement many of the tokenizer states by
matching from the start of a string with a regex. And maybe then
it'll also be possible to combine a couple of states within the same
regex too.
The more we replace PHP code by regular expressions, the faster it'll
go, but further we deviate from the processing algorithm described in
the spec. I wonder how far we could go while keeping the exact same
behaviour.
The true good solution would be to have a parser implemented in C and
available through every standard installation of PHP. It could be
used by other languages too.
Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/
More information about the whatwg
mailing list