[whatwg] Microdata DOM API issues
Philip Jägenstedt
philipj at opera.com
Wed Nov 11 18:23:54 PST 2009
I've been playing with the microdata DOM APIs again, continuing the
JavaScript experimental implementation <http://gitorious.org/microdatajs>.
It's not small or elegant, but at least some spec issues have come up in
the process.
What is the http://www.w3.org/1999/xhtml/microdata# URI? Just leftovers
from earlier revisions to the spec?
Why are the algorithms for extracting RDF gone? All that's left is the
book example with the equivalent Turtle, but it would be nice if it were
actually defined how to extract RDF. The same for the JSON stuff, was that
no good?
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#associating-names-with-items
"Otherwise, if one of the other elements in pending is an ancestor element
of candidate, and that element is scope, then remove candidate from
pending."
"Otherwise, if one of the other elements in pending is an ancestor element
of candidate, and that element also has scope as its nearest ancestor
element with an itemscope attribute specified, then remove candidate from
pending."
The intention of these requirements seems to be to eliminate redundant
elements in pending, but a comment on the intention of each in the spec
would be helpful as it's quite cryptic right now.
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#microdata-dom-api
itemtype and itemid are both URL attributes and therefore when getting
itemType and itemId relative URLs should be resolved (even if only
absolute URLs are valid). Correct?
itemprop and itemref are both "unordered set of unique space-separated
tokens", but in HTMLElement only itemProp is a DOMSettableTokenList while
itemRef is a DOMString. This doesn't really make sense, so make itemRef a
DOMSettableTokenList too? From reading the spec it's not obvious (without
following cross-references) that itemProp isn't just a plain string. An
example using .itemProp.contains(name) or similar would make this more
difficult to miss.
http://www.whatwg.org/specs/vocabs/current-work/#vcard
Having clickable cross-references in this spec would help a lot when
reviewing!
Grammar: Let value *be* the result of collecting the first vCard
subproperty named value in subitem.
"Let n1 be the value of the first property named family-name in subitem,
or the empty string if there is no such property or the property's value
is itself an item." Why not use "collecting the first vCard subproperty"
here? Not doing so had me trying to find how the two were different, but I
couldn't find any differences given that the values are later escaped.
There's also the issue of how newlines from textContent values are
escaped. Applying the vCard extraction algorithm to the spec example gives:
BEGIN:VCARD
PROFILE:VCARD
VERSION:3.0
SOURCE:http://foolip.org/microdatajs/demo/vcard.html
NAME:vCard demo
FN:Jack Bauer
PHOTO;VALUE=URI:http://foolip.org/microdatajs/demo/jack-bauer.jpg
ORG:Counter-Terrorist Unit;Los Angeles Division
ADR:;;10201 W. Pico Blvd.;Los Angeles;CA;90064;United States
GEO:34.052339;-118.410623
TEL;TYPE=work:+1 (310)\n 597 3781
URL;VALUE=URI:http://en.wikipedia.org/wiki/Jack_Bauer
URL;VALUE=URI:http://www.jackbauerfacts.com/
EMAIL:j.bauer at la.ctu.gov.invalid
TEL;TYPE=cell:+1 (310) 555\n 3781
NOTE:If I'm out in the field\, you may be better off\n contacting Chloe O'B
rian if it's about\n work\, or ask Tony Almeida if\n you're interested in
the CTU five-a-side football team we're trying\n to get going.
AGENT;VALUE=VCARD:BEGIN:VCARD\nPROFILE:VCARD\nVERSION:3.0\nSOURCE:http://fo
olip.org/microdatajs/demo/vcard.html\nNAME:vCard demo\nEMAIL\;VALUE=URI:ma
ilto:c.obrian at la.ctu.gov.invalid\nFN:Chloe O'Brian\nN:O'Brian\;Chloe\;\;\;
\nEND:VCARD\n
AGENT:Tony Almeida
REV:2008-07-20T21:00:00+0100
TEL;TYPE=home:01632 960 123
N:Bauer;Jack;;;
END:VCARD
TEL and NOTE has line breaks that are just because of how the HTML source
is formatted. Importing this into Gmail preserves these linebreaks which
looks quite broken. Unless we expect text fields to contain meaningful
formatting, perhaps simply collapsing all whitespace into a single space
is OK? In the best of worlds <br> would be converted to \n, but I'm not
sure if it's worth the trouble.
Finally on vCard, the final part of the extraction algorithm goes to great
trouble to guess what is the family name and what is the given name. This
guess will be broken for transliterated east Asian names (CJKV that I know
of, maybe others too). Just saying. Also, why is it important to
explicitly add N:;;;; for organizations?
http://www.whatwg.org/specs/vocabs/current-work/#vevent
"Add an iCalendar line with the type name and the value value to output."
At this point value is undefined.
Given the algorithm for extracting iCal, it seems that dtstart and dtend
must be specified using <time datetime="">, as it's only for time elements
that the time stamps will be properly formatted (stripping - and :)
There are some errors in the example. I got it working by applying this
diff:
--- vevent.js.orig 2009-11-11 10:52:37.000000000 +0100
+++ vevent.js 2009-11-11 23:54:15.000000000 +0100
@@ -1,3 +1,3 @@
function getCalendar(node) {
- while (node && (!node.nodeScope || !node.itemType ==
'http://microformats.org/profile/hcalendar#vevent'))
+ while (node && (!node.itemScope || !node.itemType ==
'http://microformats.org/profile/hcalendar#vevent'))
node = node.parentNode;
@@ -26,3 +26,3 @@
value = value.replace(/;/g, '\\;');
- value = value.replace(/,/g, \\,');
+ value = value.replace(/,/g, '\\,');
value = value.replace(/\n/g, '\\n');
@@ -31,3 +31,3 @@
var name = prop.itemProp[nameIndex];
- if (!name.match(':') && !name.match('.'))
+ if (!name.match(':') && !name.match('\\.'))
calendar += name.toUpperCase() + parameters + ':' + value +
'\r\n';
Perhaps /\./ would be better to make it clear that it's a regexp.
Also: if (prop.date && prop.time)
date and time aren't properties on HTMLTimeElement, I don't know what this
is. Is there or should there be a DOM API for determining if a string is a
valid date string other than implementing those algorithms in script?
http://www.whatwg.org/specs/vocabs/current-work/#licensing-works
What's the n in http://n.whatwg.org/work? If this URL is going to stick,
it would be nice if there were also something to be seen at that page.
Also, the conversion to RDF section isn't really useful and seems to hide
some assumptions about how the properties vocabulary should be prefixed
with http://n.whatwg.org/work and how the
http://www.w3.org/1999/xhtml/microdata# prefix is supposed to be used.
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#domtokenlist
The DOM intro box doesn't explain the return value for .toggle(), you have
to consult the algorithm to figure it out.
I'm sure there will be more issues, but that's it for now.
--
Philip Jägenstedt
More information about the whatwg
mailing list