[whatwg] Annotating structured data that HTML has no semantics for
Ian Hickson
ian at hixie.ch
Tue Jun 9 01:40:54 PDT 2009
On Mon, 11 May 2009, Simon Pieters wrote:
> On Sun, 10 May 2009 12:32:34 +0200, Ian Hickson <ian at hixie.ch> wrote:
>
> > Page 3:
> > <h2>My Cats<h2>
> > <dl>
> > <dt>Schrödinger
> > <dd item="com.damowmow.cat">
> > <meta property="com.damowmow.name" content="Schrödinger">
> > <meta property="com.damowmow.age" content="9">
> > <p property="com.damowmow.desc">Orange male.
> > <dt>Erwin
> > <dd item="com.damowmow.cat">
> > <meta property="com.damowmow.name" content="Lord Erwin">
> > <meta property="com.damowmow.age" content="3">
> > <p property="com.damowmow.desc">Siamese color-point.
> > <img property="com.damowmow.img" alt="" src="/images/erwin.jpeg">
> > </dl>
>
> Given the microdata solution and this example, there is now a reason other
> than styling to introduce <di>, since here you duplicate the <dt> information
> in <meta>.
>
> <dl>
> <di item="com.damowmow.cat">
> <dt property="com.damowmow.name">Schrödinger
> <dd>
> <meta property="com.damowmow.age" content="9">
> <p property="com.damowmow.desc">Orange male.
> </di>
> ...
>
> The styling problem is discussed at
> http://forums.whatwg.org/viewtopic.php?t=47
Yeah, I noticed that. I agree that if it turns out that this is a common
authoring pattern (and assuming we can work around the difficulties in
adjusting the parser to handle this), we should probably introduce <di>
after all. I intend to wait and see what happens first though.
On Mon, 11 May 2009, Giovanni Gentili wrote:
> Ian Hickson:
> > USE CASE: Annotate structured data that HTML has no semantics for, and
> > which nobody has annotated before, and may never again, for private use or
> > use in a small self-contained community.
> > (..)
> > SCENARIOS:
>
> Between the scenarios should be considered also this case:
>
> * a user (or groups of users) wants to annotate
> items present on a generic web page with
> additional properties in a certain vocabulary.
> for example Joe wants to gather in a blog
> a series of personal annotation to movies
> (or other type of items) present in imdb.com.
This isn't really a use case, it's a solution. What is the end-user
scenario that the author is trying to address? For example, what kind of
software will collect this information? What problem are we solving?
> a) In the case of properties specified for element without ancestor with
> an item attribute specified the corresponding item should be the
> document? (element body with implicit item attribute).
We already have mechanisms for providing name-value pairs for a document;
namely, <meta name> and <link rel>.
> b) Do we need to require UA to offer a standard way to visualize (at
> least as an option left to the user) the structured information carried
> in microdata ?
Not as far as I can tell; what use case would this be for?
> And copy&paste?
The spec already requires user agents to include microdata in copy and
paste.
On Tue, 12 May 2009, Tim Tepaße wrote:
> >
> > (Note the <meta>s in the last example -- since sometimes the
> > information isn't visible, rather than requiring that people put it in
> > and hide it with display:none, which has a rather poor accessibility
> > story, I figured we could just allow <meta> anywhere, if it has a
> > property="" attribute.)
>
> That seems to be a solution optimised for extremely invisible metadata
> but not for metadata which differs from the human visible data.
It handles both -- instead of:
<span itemprop="x">y</span>
...you can do:
<span><meta itemprop="x" content="y">z</span>
> Imagine as an example the simple act of marking up a number (and
> ignoring what the number denotes). For human consumption a thousands
> seperator is often used, the type of seperator differs by language,
> locale and context. Just in my little word I see on regular basis the
> point, the comma, the space, the thin space and sometimes the the
> apostrophe. Parsing different representations of numbers would be a
> chore. The value of textContent of the element <span
> itemprop="com.example.price"> 1thinsp;000thinsp;000,—</span>
> is clearly unusable, demanding an additional invisible <meta
> property="com.example.price" content="1000000">.
Right.
> My irritation lies in the element proliferation, requiring one element/
> attribute combination for machines, one element/text content combination
> for humans. Of course, any sane author would arrange both elements in a
> close relation, as parent/child or sibling but there would be still two
> different elements to maintain, leading to a higher cognitive load. Not
> just for authors but also for programmers: a fluctating price had to be
> actualized on two different elements; tree walking DOM scripts had to
> take meta-Elements in account. Furthermore it clashes with the familiar
> habit of other elements in HTML. A hyperlink is one element with a
> machine-readable attribute and human- readable text content. A citation
> is one element with a machine-readable reference and human-readable text
> content. The same model is used in <meter>, <progress>, <time>, <abbr>
> ... but not in user-defined objects. I'd prefer an additional
> @content-like attribute which supersedes the text content and maybe even
> the default values of the other value-bearing elements, reducing two
> different elements to maintain or change to just one.
I don't really understand what you are proposing. How would you reduce the
number of places where the value is represented?
I think on the long run if we find particular data types (such as numbers)
are commonly used in these scenarios, we'll just introduce a new element
like we did for dates and times (<time>), which can then just render the
number in a locale-specific manner automatically.
> > Instead, let us try using the regular "IDREF" functionality that HTML
> > uses in a variety of other places, like <label for="">. For this we'll
> > need a new attribute, but unfortunately we can't use about="" (which
> > would be the obvious name to use), because that would conflict with
> > RDFa, so instead we'll use subject="":
>
> I'm slighty irritated by the implied change from active, possessive
> formulating (The cat has the name Hedral.) to something more passive-y
> (Hedral is a name owned by that cat.). My mental model for property
> relationships orients itself more on the former wording; link
> relationships are similar in that regard. @about/@subject are like @rev;
> a @resource alias @rel would feel more natural. There are practical
> relation by the missing @resource, I think. Imagine a document
> documenting an household and a household vocabulary which allows triples
> of <human>s which are in an <owner> relationship to a <cat>. Given an
> household of two humans and one cat; how does one markup the assumption
> that the cat has two owners?
I agree that there are use cases for both subject="" like "reverse" links
back to an item, and "inclusion"-style links that embed other data into an
item. I have a rough proposal in the form of a <ref> element that could be
used to do inclusions (based somewhat on the Microformats include
pattern), but I think we should wait to see how well microdata works in
the wild before we start adding more features. If it turns out that
microdata doesn't work, then there's no point worrying about inclusions!
On Tue, 12 May 2009, Eduard Pascual wrote:
>
> First issue: it solves a (major) subset of what RDFa would solve.
> However, it has been taken as a requirement to avoid
> clashes/incompatibilities with RDFa. In other words, as things stand,
> authors will face two options: either use RDFa in HTML5, which would
> forsake validation but actually work; or take a less powerful, less
> supported (at least for now: many RDFa-aware agents vs. zero HTML5's
> microdata -aware agents) that validates but provides no pragmatic
> advantages.
>
> IMO, an approach that forces authors to choose between
> validity/conformance which doesn't *yet* works vs. invalid solutions
> that actually work is a horrible idea: it encourages authors to forsake
> validity if they want things to work.
>
> Wouldn't the RDFa + @prefix solution suggested many times work better
> and require less effort (for spec writters, for implementors, and for
> content authors)? Keep in mind that I don't think RDFa + @prefix is the
> solution we need; I'm just trying to point out that the current approach
> is even worse than that.
I covered the problems with RDFa in the e-mail introducing microdata:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html
I agree that introducing yet another proposal for this is unfortunate, but
I couldn't find a way to address RDFa's issues (or Microformat's issues,
or the issues with a number of other technologies I looked at) without
changing RDFa in an incompatible manner. I believe the problems with
introducing yet another technology to do this are minor compared to the
problems with the existing technologies.
> Second issue: as the "decaffeinated RDFa" it is, the HTML5 Microdata
> approach tends to fail where RDFa itself fails. It's nice that, thanks
> to the <time> element, the problem with trying to reuse human-readable
> dates as machine-readable is dodged; but there are other cases where
> separate values might be needed: for example using a street address for
> the human-readable representation of a location and the exact geographic
> coordinates as the machine-readable (since not all micro-data parsers
> can rely on Google Maps's database to resolve street addresses, you
> know); or using a colored name (such as "lime green" displayed on lime
> green color) as the human-readable representation of a color, and the
> hexcode (like #00FF00) as the machine-readable representation. These are
> just the cases from the top of my head, and this can't be considered in
> any way a complete list. While *favoring* the reuse of human-readable
> values for the machine-readable ones is appropiate, because it's the
> widely most common case, *forcing* that reuse is a quite bad idea,
> because it is *not* the *only* case.
Microdata doesn't force that reuse; you can use <meta> instead.
> Third issue: also a flaw inherited from RDFa, it can be summarized as
> completelly ignoring the requirement I submitted to this list on April
> 28th, in reply to Ian asking us to review the use cases [1]. I'll try to
> illustrate it with a example, inspired by the original use-case: Let's
> say someone's marking up a collection of iguanas (or cats, or even CDs,
> doesn't really make a difference when illustrating this issue), making a
> page for each iguana (or whatever) with all the details for it; and then
> making an "index" page listing the maybe 20 iguanas with their name,
> picture, and link to the corresponding page. Adding micro-data to that
> "index", either with RDFa or with Ian's microdata proposal, would
> involve stating 20 times in the markup something like "this is the
> iguana's picture; this is the iguana's name; and this is the iguana's
> URL". It would be preferable to be able to state something like "each
> (row) <tr> in the <table> describes an iguana: the <img>s are each
> iguana's picture, the contents of the <a>'s are the names, and the @href
> of the <a>'s are the URLs to their main pages" just once. If I only need
> to state the table headings once for the users to understand this
> concept, why should a micro-data consumer require me to state it 20
> times, once for each row? Please note how such a page would be quite
> painful to maintain: any mistake in the micro-data mark-up would
> generate invalid data and require a manual harvest of the data on the
> page, thus killing the whole purpose of micro-data. And repeating
> something 20 (or more) times brings a lot of chances to put a typo in,
> or to miss an attribute, or any minor but devastating mistake like
> these.
I agree entirely. I actually tried to find a workable solution to address
this but unfortunately the only general solutions I could come up with
that would allow this were selector-based, and in practice authors are
still having trouble understanding how to use Selectors even with CSS.
There's also the problem with separating the data from the rules that say
how to interpret the data, which would likely lead to more problems than
the typos one would get from repeating the itemprop=""s.
I'll probably look at this again in more detail when responding to the
thread specifically on this topic.
> Last, but not least, I'm not sure if it was wise to start defining a
> solution while some of the requirements seem to be still under
> discussion.
Well, they were under discussion for about a year; at some point we have
to just do something, or else this feature would have missed the HTML5
train altogether.
On Wed, 13 May 2009, Eduard Pascual wrote:
> On Sun, May 10, 2009 at 12:32 PM, Ian Hickson <ian at hixie.ch> wrote:
> > [...]
> > * Any additional markup or data used to allow the machine to understand
> > the actual information shouldn't be redundantly repeated (e.g. on each
> > cell of a table, when setting it on the column is possible).
> >
> > This isn't met at all with the current proposal. Unfortunately the only
> > general solutions I could come up with that would allow this were
> > selector-based, and in practice authors are still having trouble
> > understanding how to use Selectors even with CSS.
>
> First, I'd like to ask for a clarification from Ian: what do you mean by
> "autrhos are still having trouble understanding how to use Selectors"?
> If you mean that they have trouble when trying to select something like
> "the second cell of the first row that has a 'foo' attribute different
> from 'bar' within tables that have four or more rows" or even more
> obscure stuff, then I should agree: most authors will definitely have
> trouble dealing with so complex cases, and I bet many will always have
> such trouble. However, if you mean that authors can't deal with simple
> class, id, and/or children/descendant selectors, then I think you are
> seriously understimating authors.
I was referring to the kind of selectors one would need to make good use
of a semantic-extraction mechanism using selectors.
> Actually, I was thinking about the cost of deploying implementations,
> rather than writting them, since RDFa consumers are already out there
> and working. This, however, strays a bit out of the original idea: it's
> not really a matter of how big the cost is on its own, but of what do
> you get for that cost. This is probably my own fault, but I still fail
> to see what Ian's suggestion offers that RDFa doesn't; so my impression
> is that these costs, even if they are small, are buying nothing, so they
> are not worth it. If someone is willing to highlight what makes this
> proposal worth the costs (ie: what makes it better than RDFa), I'm
> willing to listen.
The main benefit as I see it is is simplicity. I've seen even RDFa
advocates stumble over exactly what triples get generated from some
particularly gnarly RDFa snippets. With microdata, there's really no way
to form a gnarly snippet. Each group is introduced by an item="", each
name/value pair is introduced by an itemprop="", and there's subject="" to
handle non-nested cases. No prefixes, no CURIEs, no rel vs rev, no data
types, no hanging triples, no resource="" vs src="", etc.
On Tue, 12 May 2009, Shelley Powers wrote:
> >
> > If we can come up with a way of using the string "foaf:name" without
> > having to declare "foaf" in each document, I'm totally in agreement.
> > I've considered maybe registering the "foaf" URL scheme, or using some
> > other punctuation character and having people register prefixes, but I
> > don't know what punctuation character to use (':' and '.' are both
> > taken).
>
> But then we would lose the extensibility, which is the power behind all
> of this.
I don't see why registering particular prefixes involves losing
extensibility... aren't URI's extensible?
> But regardless, the majority of people will include metadata markup by
> installing a plug-in or module, and making a couple of choices. And if
> you put together a good ten-minute tutorial for the average developer,
> they'll have no problem with "foaf:name". Training and clarity of
> communication is much ore important than form, it always has been with
> technology.
I think you significantly underestimate the difficulty of getting Web
authors interested in doing the right thing.
On Wed, 13 May 2009, Leif Halvard Silli wrote:
>
> CSS and selectors appears to be one of the best understood technologies
> of the web.
That certainly hasn't been my experience, at least not with anything
beyond the simplest of selectors.
On Wed, 13 May 2009, Giovanni Gentili wrote:
> >
> > If we can come up with a way of using the string "foaf:name" without
> > having to declare "foaf" in each document, I'm totally in agreement.
> > I've considered maybe registering the "foaf" URL scheme, or using some
> > other punctuation character and having people register prefixes, but I
> > don't know what punctuation character to use (':' and '.' are both
> > taken).
>
> put in HTML5 some predefined prefixes for @itemprop:
>
> dc = http://purl.org/dc/terms/
> foaf = http://xmlns.com/foaf/0.1/
> vcard = http://www.w3.org/2001/vcard-rdf/3.0#
> owl = http://www.w3.org/2002/07/owl#
> rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns#
> rdfs = http://www.w3.org/2000/01/rdf-schema#
> sioc = http://rdfs.org/sioc/ns#
> skos = http://www.w3.org/2004/02/skos/core#
> xsd = http://www.w3.org/2001/XMLSchema#
If we're going to predefine things, we're better off predefining all the
terms so that we don't need prefixes at all.
> also, instead of @item @itemprop @subject
> is better @item @prop @subj
Why is that better?
> or @rdf-typeof @rdf-property @rdf-about (and @rdf-rel)
This isn't really about RDF, so I think that would be confusing.
On Thu, 14 May 2009, Shelley Powers wrote:
>
> Actually, I believe there are other differences, as others have pointed
> out.
>
> http://www.jenitennison.com/blog/node/103
>
> http://realtech.burningbird.net/semantic-web/semantic-web-issues-and-practices/holding-on-html5
>
> Some of the differences have resulted in more modifications to the
> underlying HTML5 spec, which is curious, because Ian has stated in
> comments that support for RDF is only a side interest and not the main
> purpose behind the microdata section.
Even side interests need some love.
On Thu, 14 May 2009, Philip Taylor wrote:
>
> If I understand RDF correctly, the idea is that everything can be
> URIs, subjects and objects can instead be blank nodes, and objects can
> instead be literals. If we restrict literals to strings (optionally
> with languages), then I think all triples must follow one of these
> eight patterns:
>
> <urn:subject> <urn:predicate> <urn:object> .
> <urn:subject> <urn:predicate> "object" .
> <urn:subject> <urn:predicate> "object"@lang .
> <urn:subject> <urn:predicate> _:X .
> _:X <urn:predicate> <urn:object> .
> _:X <urn:predicate> "object" .
> _:X <urn:predicate> "object"@lang .
> _:X <urn:predicate> _:Y .
>
> These cases can be trivially mapped into HTML5 microdata as:
>
> <div item>
> <link itemprop="about" href="urn:subject">
> <link itemprop="urn:predicate" href="urn:object">
> </div>
>
> <div item>
> <link itemprop="about" href="urn:subject">
> <meta itemprop="urn:predicate" content="object">
> </div>
>
> <div item>
> <link itemprop="about" href="urn:subject">
> <meta itemprop="urn:predicate" content="object" lang="lang">
> </div>
>
> <div item>
> <link itemprop="about" href="urn:subject">
> <meta itemprop="urn:predicate" item id="X">
> </div>
>
> <link subject="X" itemprop="urn:predicate" href="urn:object">
>
> <meta subject="X" itemprop="urn:predicate" content="object">
>
> <meta subject="X" itemprop="urn:predicate" content="object" lang="lang">
>
> <meta subject="X" itemprop="urn:predicate" item id="Y">
On Fri, 15 May 2009, Philip Taylor wrote:
>
> Hmm, I think I'm wrong here. 'id' has to be unique, which means this
> pattern won't work if _:X is the object for triples with two different
> subjects.
Right; this would require the "include pattern" idea from Microformats,
which I have some ideas on (using a <ref itemprop="" href=""> element),
but I think we should wait to see how microdata fares as is first (none of
the use cases for microdata actually needed this).
On Mon, 18 May 2009, Eduard Pascual wrote:
>
> Ian's initial message goes step by step through the creation of this new
> syntax; but does *not* mention at all *why* it was being created on the
> first place. The insight into the choices taken is indeed a good think,
> and I thank Ian for it; but he omitted to provide insight into the first
> choice taken: discarding the multiple options already available (not
> only Microformats and RDFa, but also other less discussed ones such as
> eRDF, EASE, etc). Sure, there has been a lot of discussion on this
> topic; and it's possible that the choice was taken as part of such
> discussions. In any case, I think Ian should have clearly stated the
> reasons to build a brand new solution when many others have been out for
> a while and users have been able to try and test them.
I didn't list every solution I considered (such as eRDF and EASE) because
those solutions were already so widely criticised by the supporters of
technologies like RDFa and Microformats that I didn't really think there
was any point going into more detail about those.
eRDF's main problem is the use of class=""; like Microformats, this causes
confusion with authors.
RDF EASE separates the semantics from the data, which in general I believe
would lead to a very brittle system.
Both also suffer from many of the indirection problems inherent of any
prefix-based system.
However, like Microformats, nothing prevents either of these systems from
being used with HTML5 today, as far as I can tell. Unlike RDFa, they do
not require any changes to the underlying markup language. If they are
better than microdata, then they will see adoption, and we can drop
microdata from the HTML5 draft.
> Ok, the syntax is simpler for a subset of the use cases; but it leaves
> entirely out the rest of use cases.
As far as I'm aware, microdata handles all the use cases that were
mentioned that are also solved by RDFa and Microformats (as well as
solving a number of other use cases that the other two don't solve, such
as the drag-and-drop cases). If there are use cases that should be
addressed that I did not address, please raise them.
> *extensibility*. And, with over two decades between versions of the
> specs, this is a strong requirement: if a problem is noticed after HTML5
> becomes "the standard", it's essential to be able to solve it without
> waiting 10 or 20 years for HTML6 to come out. In addition, your alleged
> "simplified" data model is actually an over-complication, as it is
> defined in the form of restrictions and/or limitations over RDF's model.
> Try to explain what can be represented in RDF, and what can be
> represented with microdata, and you'll see what's simpler.
The only reason there was a long wait between HTML4 and HTML5 is that HTML
was abandoned by the W3C. As soon as someone wanted to maintain HTML
again, development resumed. There's no reason to believe it would take 20
years to add a new feature -- it could be done overnight if there was the
will and the need.
> Henri:
> > I consider it an advantage that reverse domains don't suggest that you
> > should try dereferencing identifiers as if they were addresses.
>
> But they still try to look address-like enough. Lots of users will try
> to "reorder" the domain and recover the address behind it (without
> knowing that there isn't a real address behind it), with the obvious
> confussion when they fail at all attempts (since something that doesn't
> exist can't be recovered).
I don't see any reason to believe this is the case. Do people do this with
Java identifiers? I've never heard of anyone trying to do that.
On Fri, 15 May 2009, Eduard Pascual wrote:
> On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak <mjs at apple.com> wrote:
> >
> > From my cursory study, I think microdata could subsume many of the use
> > cases of both microformats and RDFa.
>
> Maybe. But microformats and RDFa can handle *all* of these cases.
Actually, this isn't true. For example, neither Microformats and RDFa have
a DOM or integration with the drag-and-drop API in HTML5.
> Allright, an API may be a benefit. Most probably it is. However, a
> similar API could have been built from RDFa, or eRDF, or EASE, or any
> other already existing or new solution
As has been discussed, I did consider RDFa, but IMHO it has some fatal
problems that make it inappropriate for text/html. eRDF's use of "class"
is suboptimal, and it hasn't been found especially popular amongst the RDF
community. EASE would be significantly more complicated for user agents to
implement as a DOM API than the other three.
> Now microdata comes out, some drawbacks are highlighted in comparison
> with RDFa (lack of typing,
As far as I can tell, per-instance typing (as opposed to per-class typing,
where all properties of a particular name are implicitly typed) is not a
requirement for any of the use cases that were listed and that I tried to
address. If there are use cases that require per-instance typing, please
do bring them up so that they can be considered.
> inability to depict the full RDF model,
Depicting the RDF model isn't necessary to resolve the use cases that were
brought up; that there is any RDF support at all is opportunistic rather
than a design goal. (That is, adding RDF support was simple enough to do
that I figured I might as well, rather than something that's key to the
use cases listed.)
> Reversed domains are as ugly as CURIEs (but at least CURIEs resolve to
> something useful, while reversed domains often don't resolve at all)
As Henri has pointed out, not resolving is actually a benefit; there has
been ample evidence that making identifiers resolve causes more problems
than it solves. For example:
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
> what does microdata provide to make up for its drawbacks?
A simpler, more usable syntax; a DOM API; and integration with the HTML5
drag-and-drop model, amongst other things that other have listed.
> Of course. I'm willing to experiment. I experimented with RDFa some
> time ago and found it unsuitable for my needs. I have taken a look at
> microdata and it isn't any more suitable.
I will look carefully at your proposal shortly; hopefully that thread will
clarify what your needs are.
> Technical problems have already been cited. Here's a summary, just
> review the thread for further details on each:
>
> - Microdata can't handle non-string objects: even if I use the <time>
> element to mark up a date with microdata, it will be taken as a string
> rather than as a date. Since tools may be relying on explicit typing for
> some tasks, this limitation renders microdata less usable than RDFa.
What are the use cases for which this matters? As far as I can tell,
typing was not necessary for any of the use cases that were brought up,
which are listed in the e-mails cited here:
http://lists.w3.org/Archives/Public/public-html/2009May/0207.html
For example, consider the calendar use cases. The output format is vEvent,
and the typing information doesn't affect the conversion.
> - Microdata is an entirely new syntax. It requires implementation and
> deployment of consumers before it can be used at all; while RDFa has
> already gone through this step. (This is not a very serious problem, but
> is a cost to be considered.)
I think that relative to the long term goals here, no format has really
gotten significant traction. RDFa and Microformats have the most usage,
but even they see virtually no usage when examined at the scale of the Web
as a whole.
This isn't a criticism of RDFa or Microformats; but I do think it is naive
to claim that RDFa has "gone through the implementation and deployment
stage". Note that implementation of microdata fetaures has been shown to
be basically trivial -- two people implemented basically the entire
microdata model within 24 hours of hearing of the proposal (in fact one of
them did it between when I checked in the spec text and went to bed, and
when I woke up the next morning).
> - Microdata can't represent the full RDF data model (while RDFa can):
> some complex structures are just not expressable with microdata.
As far as I can tell none of the use cases actually need RDF at all, let
alone the more complex features of RDF, so this doesn't seem to be a
problem in practice.
(The only exception might be the validation use case, for which one
possible solution is reusing RDFS/OWL, but personally my stance on
validation is that schemas aren't a good solution and hard-coded tools are
better, so I'm not really convinced of even this exception.)
> - Microdata inherits all of the flaws from RDFa (except for the use of
> CURIEs, which some people here consider an inconvenient). For example,
> marking up a list of items requires lots of redundant code.
Yes, it would be nice to find a better solution for this that doesn't
involve external resources or Selectors.
> - Microdata relies on reversed domains. While some people argue these to
> be better than CURIEs, they are equally horrendous for the average user,
> and have the additional disadvantage that they don't map to anything
> useful (if they map to something at all), while CURIEs map to the
> descriptions and/or definitions of what they represent.
Not mapping to anything is an intentional design decision; I do not
believe that having identifiers be resolvable as URIs is good design. I do
not think that reversed-DNS identifiers (which are basically just opaque
strings with dots in them) are anywhere near as horrendous as prefix-based
solutions or even straight URIs, though straight URIs are supported in
microdata if that is preferred.
On Fri, 15 May 2009, Shelley Powers wrote:
>
> You don't have to take my word for it -- check out Philip's testing demo
> for microdata. You get triples with the following:
>
> http://www.w3.org/1999/xhtml/custom#com.damowmow.cat
>
> http://philip.html5.org/demos/microdata/demo.html#output_ntriples
>
> Not only do you face problems with link rot, you also face a significant
> amount of confusion, as people look at that and go, "What the hell is
> that?"
If anyone has a better suggestion for how to map reversed DNS names to RDF
identifiers, I'd be glad to update the spec. (The only requirement is that
it not be possible for two different conforming microdata identifiers to
result in the same RDF identifier -- to this end, for instance, the
"http://www.w3.org/1999/xhtml/custom#" prefix is not conforming in any
itemprop="" names in HTML5.)
The current mapping is just there because I couldn't find a better
solution. RDF doesn't seem to support non-URI identifiers.
> But hey, you've given me another idea. I think I'll create my own
> vocabulary items, with the reversed DNS
> http://www.w3.org/1999/xhtml/custom#com.sun.*. No, maybe
> http://www.w3.org/1999/xhtml/custom#com.opera.*. Nah, how about
> http://www.w3.org/1999/xhtml/custom#com.microsoft.*. Yeah, that's cool.
> And there is no mechanism is place to prevent this, because unlike
> "regular" URIs, where the domain is actually controlled by specific
> entity, you've created the world famous W3C fudge pot. Anything goes.
What stops you from creating http://sun.com/* identifiers?
Reversed DNS names use the exact same registration mechanism as URIs.
> But Foobar takes a dive in the dot com pool, and foobar.com gets taken
> over by a porn establishment. Yeah, I can't wait for people to explain
> that one to the boss. Just because it doesn't link, won't mean it won't
> end up on Twitter as a big, huge joke.
This seems to be a problem for URIs far more than reversed DNS names...
On Fri, 15 May 2009, Manu Sporny wrote:
> Kristof Zelechovski wrote:
> >
> > (WHATWG wants HTML documents to be readable 1000 years from now.)
>
> Is that really a requirement?
I believe Kristof is refering to my personal desire to write documentation
that describes the Web infrastructure in enough detail that a software
archeologist 1000 years from now, armed only with the spec and an archive
of today's Web content, would be able to write a user agent to render the
Web content as today's browsers do, without having to reverse engineer the
expected behaviour from the content.
> Also, why 1000 years, that seems a bit arbitrary? =P
The number is arbitrary; I just mean a sufficiently long time that it is
plausible that there would be no way to actually run today's software.
On Thu, 14 May 2009, Eduard Pascual wrote:
>
> But *why* restrict literals to strings?? Being unable to state that
> "2009-05-14" is a date makes that value completely useless
This is demonstrably false -- vEvent blobs in HTML5 microdata have a
defined mapping to raw iCalendar data, including roundtripping of dates,
despite their being treated as strings.
> it would only be useful on contexts where a date is expected (bascially,
> because it is a date), but it can't be used on such contexts because the
> tool retrieving the value has no hint about it being a date. Same is
> true for integers, prices (a.k.a. decimals plus a currency symbol),
> geographic coordinates, iguana descriptions, and so on.
You can distinguish all of those by inspection.
In any case, why would you put a price in the same field as a date? If you
use stringly typed _fields_, rather than _values_, you can sidestep the
entire issue cleanly.
On Thu, 14 May 2009, Jonas Sicking wrote:
>
> Some of the improvement suggestions that I have heard that sounds
> interesting, though possibly for the next version of microdata.
>
> * Support for specifying a machine-readable value, such as for dates,
> colors, numbers, etc.
I expect we will add support for these based on demand, the same way we
added <time> in the first place.
> I even wonder it would allow replacing the <time> element with a
> standardized microformat, such as:
>
> Christmas is going down on <span item="w3c.time"
> itemvalue="12-25-2009">The 25th day of December<span>!
I don't really understand how that would be better than dedicated
elements.
> * Support for tabular data.
This would be nice if we can find a way to do it that doesn't put undue
burdens on simple implementations. (e.g. I would imagine that while a
microdata implementation today can be a few hundred lines total, adding
support for the table model could easily double that.)
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list