[whatwg] Microdata DOM API issues

Wed Nov 11 18:23:54 PST 2009

I've been playing with the microdata DOM APIs again, continuing the  
JavaScript experimental implementation <http://gitorious.org/microdatajs>.  
It's not small or elegant, but at least some spec issues have come up in  
the process.

What is the http://www.w3.org/1999/xhtml/microdata# URI? Just leftovers  
 from earlier revisions to the spec?

Why are the algorithms for extracting RDF gone? All that's left is the  
book example with the equivalent Turtle, but it would be nice if it were  
actually defined how to extract RDF. The same for the JSON stuff, was that  
no good?


http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#associating-names-with-items

"Otherwise, if one of the other elements in pending is an ancestor element  
of candidate, and that element is scope, then remove candidate from  
pending."

"Otherwise, if one of the other elements in pending is an ancestor element  
of candidate, and that element also has scope as its nearest ancestor  
element with an itemscope attribute specified, then remove candidate from  
pending."

The intention of these requirements seems to be to eliminate redundant  
elements in pending, but a comment on the intention of each in the spec  
would be helpful as it's quite cryptic right now.


http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#microdata-dom-api

itemtype and itemid are both URL attributes and therefore when getting  
itemType and itemId relative URLs should be resolved (even if only  
absolute URLs are valid). Correct?
itemprop and itemref are both "unordered set of unique space-separated  
tokens", but in HTMLElement only itemProp is a DOMSettableTokenList while  
itemRef is a DOMString. This doesn't really make sense, so make itemRef a  
DOMSettableTokenList too? From reading the spec it's not obvious (without  
following cross-references) that itemProp isn't just a plain string. An  
example using .itemProp.contains(name) or similar would make this more  
difficult to miss.


http://www.whatwg.org/specs/vocabs/current-work/#vcard

Having clickable cross-references in this spec would help a lot when  
reviewing!

Grammar: Let value *be* the result of collecting the first vCard  
subproperty named value in subitem.

"Let n1 be the value of the first property named family-name in subitem,  
or the empty string if there is no such property or the property's value  
is itself an item." Why not use "collecting the first vCard subproperty"  
here? Not doing so had me trying to find how the two were different, but I  
couldn't find any differences given that the values are later escaped.

There's also the issue of how newlines from textContent values are  
escaped. Applying the vCard extraction algorithm to the spec example gives:

BEGIN:VCARD
PROFILE:VCARD
VERSION:3.0
SOURCE:http://foolip.org/microdatajs/demo/vcard.html
NAME:vCard demo
FN:Jack Bauer
PHOTO;VALUE=URI:http://foolip.org/microdatajs/demo/jack-bauer.jpg
ORG:Counter-Terrorist Unit;Los Angeles Division
ADR:;;10201 W. Pico Blvd.;Los Angeles;CA;90064;United States
GEO:34.052339;-118.410623
TEL;TYPE=work:+1 (310)\n  597 3781
URL;VALUE=URI:http://en.wikipedia.org/wiki/Jack_Bauer
URL;VALUE=URI:http://www.jackbauerfacts.com/
EMAIL:j.bauer at la.ctu.gov.invalid
TEL;TYPE=cell:+1 (310) 555\n  3781
NOTE:If I'm out in the field\, you may be better off\n contacting Chloe O'B
  rian if it's about\n work\, or ask Tony Almeida if\n you're interested in
  the CTU five-a-side football team we're trying\n to get going.
AGENT;VALUE=VCARD:BEGIN:VCARD\nPROFILE:VCARD\nVERSION:3.0\nSOURCE:http://fo
  olip.org/microdatajs/demo/vcard.html\nNAME:vCard demo\nEMAIL\;VALUE=URI:ma
  ilto:c.obrian at la.ctu.gov.invalid\nFN:Chloe O'Brian\nN:O'Brian\;Chloe\;\;\;
  \nEND:VCARD\n
AGENT:Tony Almeida
REV:2008-07-20T21:00:00+0100
TEL;TYPE=home:01632 960 123
N:Bauer;Jack;;;
END:VCARD

TEL and NOTE has line breaks that are just because of how the HTML source  
is formatted. Importing this into Gmail preserves these linebreaks which  
looks quite broken. Unless we expect text fields to contain meaningful  
formatting, perhaps simply collapsing all whitespace into a single space  
is OK? In the best of worlds <br> would be converted to \n, but I'm not  
sure if it's worth the trouble.

Finally on vCard, the final part of the extraction algorithm goes to great  
trouble to guess what is the family name and what is the given name. This  
guess will be broken for transliterated east Asian names (CJKV that I know  
of, maybe others too). Just saying. Also, why is it important to  
explicitly add N:;;;; for organizations?


http://www.whatwg.org/specs/vocabs/current-work/#vevent

"Add an iCalendar line with the type name and the value value to output."

At this point value is undefined.

Given the algorithm for extracting iCal, it seems that dtstart and dtend  
must be specified using <time datetime="">, as it's only for time elements  
that the time stamps will be properly formatted (stripping - and :)

There are some errors in the example. I got it working by applying this  
diff:

--- vevent.js.orig	2009-11-11 10:52:37.000000000 +0100
+++ vevent.js	2009-11-11 23:54:15.000000000 +0100
@@ -1,3 +1,3 @@
  function getCalendar(node) {
-  while (node && (!node.nodeScope || !node.itemType ==  
'http://microformats.org/profile/hcalendar#vevent'))
+  while (node && (!node.itemScope || !node.itemType ==  
'http://microformats.org/profile/hcalendar#vevent'))
      node = node.parentNode;
@@ -26,3 +26,3 @@
        value = value.replace(/;/g, '\\;');
-      value = value.replace(/,/g, \\,');
+      value = value.replace(/,/g, '\\,');
        value = value.replace(/\n/g, '\\n');
@@ -31,3 +31,3 @@
        var name = prop.itemProp[nameIndex];
-      if (!name.match(':') && !name.match('.'))
+      if (!name.match(':') && !name.match('\\.'))
          calendar += name.toUpperCase() + parameters + ':' + value +  
'\r\n';

Perhaps /\./ would be better to make it clear that it's a regexp.

Also: if (prop.date && prop.time)

date and time aren't properties on HTMLTimeElement, I don't know what this  
is. Is there or should there be a DOM API for determining if a string is a  
valid date string other than implementing those algorithms in script?


http://www.whatwg.org/specs/vocabs/current-work/#licensing-works

What's the n in http://n.whatwg.org/work? If this URL is going to stick,  
it would be nice if there were also something to be seen at that page.

Also, the conversion to RDF section isn't really useful and seems to hide  
some assumptions about how the properties vocabulary should be prefixed  
with http://n.whatwg.org/work and how the  
http://www.w3.org/1999/xhtml/microdata# prefix is supposed to be used.


http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#domtokenlist

The DOM intro box doesn't explain the return value for .toggle(), you have  
to consult the algorithm to figure it out.


I'm sure there will be more issues, but that's it for now.

-- 
Philip Jägenstedt