[whatwg] Microdata Feedback: A Server Side implementation of a Microdata Consumer library.
Emiliano Martinez Luque
martinezluque at gmail.com
Fri Feb 11 13:07:28 PST 2011
Hi everybody, I originally intended to send this message to the
implementors list but seeing in the archives that there hasn't been
much activity there for the last couple of months, I'm sending this to
the general list. Well, basically I just wanted to announce that I've
just released ( http://github.com/emluque/MD_Extract ) a library for
server side Microdata consuming. There are some known issues (
particularly with non-ASCII-extending character encodings, also the
text extraction mechanism from a tree of nodes is very basic, etc. )
but I still felt it was sensible to release it to showcase the
possibilities of the Microdata specification.
I based the implementation on the Algorithm provided by the WhatWG but
there are some variations, the most notable one being that I'm
constructing an intermediate results data structure while traversing
the Html tree rather than storing them in a list and then sorting them
later in tree order as the spec says. I did take Tab's suggestion of
doing a first pass through the Html tree and storing a list of
references to elements with ids ( which was a great suggestion, it
makes the code way clearer and it completely changed the way I was
thinking about the problem ).
To test this:
1. Make sure you have PHP 5 with Tidy (
http://www.php.net/manual/en/tidy.installation.php ) and MB_String (
http://ar.php.net/manual/en/mbstring.installation.php ) support.
2. Download the folder, uncompress it and move it to an apache dir. (
or clone it from github: git clone
3. Access the /examples folder with your browser.
Other than that, it reports most common errors ( like an element
marked up with itemscope not having child nodes, or a img element
marked with itemprop and not having an src attribute ). I believe that
apart from the known issues, and thinking just about microdata syntax,
it's 100% compliant with the latest microdata spec (Though there might
be some edge cases I might not be considering).
I'm hoping that it gets tested, this time I made it so that all it
takes (other than having the appropriate configuration of PHP) is
downloading and uncompressing the folder, please do, you will like it.
And please fill any bug reports through the github interface or
through the contact form at my personal page at
Again thank you for a great spec,
Emiliano Martínez Luque
More information about the whatwg