[whatwg] Please review use cases relating to embedding micro-data in text/html
timeless
timeless at gmail.com
Fri Apr 24 04:35:05 PDT 2009
The contacts section uses "event" where it meant contact
On 4/23/09, Ian Hickson <ian at hixie.ch> wrote:
>
> [bcc'ed previous participants in this discussion]
>
> Earlier this year I asked for use cases that HTML5 did not yet cover, with
> an emphasis on use cases relating to semantic microdata. I list below the
> use cases and requirements that I derived from the response to that
> request, and from related discussions.
>
> I would appreciate it if people could review this list for errors or
> important omissions, before I go through the list to work out whether
> these use cases already have solutions, or whether we should have
> solutions for these use cases in HTML, or whether we should address these
> use cases with other technologies, or whatnot.
>
> I encourage people to focus on the use cases themselves, rather than on
> potential solutions; various solutions to all these use cases have already
> been argued in great detail and I have already read all those e-mails,
> blog comments, wiki faqs, etc, carefully.
>
> My primary concern right now is in making sure that these are indeed the
> use cases people care about, so that whatever we add to the spec can be
> carefully evaluated to make sure it is in fact solving the problems that
> we want solving.
>
> ==============================================================================
>
> Exposing known data types in a reusable way
>
> USE CASE: Exposing calendar events so that users can add those events to
> their calendaring systems.
>
> SCENARIOS:
>
> * A user visits the Avenue Q site and wants to make a note of when
> tickets go on sale for the tour's stop in his home town. The site
> says
> "October 3rd", so the user clicks this and selects "add to calendar",
> which causes an entry to be added to his calendar.
> * A student is making a timeline of important events in Apple's
> history.
> As he reads Wikipedia entries on the topic, he clicks on dates and
> selects "add to timeline", which causes an entry to be added to his
> timeline.
> * TV guide listings - browsers should be able to expose to the user's
> tools (e.g. calendar, DVR, TV tuner) the times that a TV show is on.
> * Paul sometimes gives talks on various topics, and announces them on
> his blog. He would like to mark up these announcements with proper
> scheduling information, so that his readers' software can
> automatically obtain the scheduling information and add it to their
> calendar. Importantly, some of the rendered data might be more
> informal than the machine-readable data required to produce a
> calendar
> event. Also of importance: Paul may want to annotate his event with a
> combination of existing vocabularies and a new vocabulary of his own
> design. (why?)
> * David can use the data in a web page to generate a custom browser UI
> for adding an event to our calendaring software without using brittle
> screen-scraping.
>
> REQUIREMENTS:
>
> * Should be discoverable.
> * Should be compatible with existing calendar systems.
> * Should be unlikely to get out of sync with prose on the page.
> * Shouldn't require the consumer to write XSLT or server-side code to
> read the calendar information.
> * Machine-readable event data shouldn't be on a separate page than
> human-readable dates.
> * The information should be convertible into a dedicated form (RDF,
> JSON, XML, iCalendar) in a consistent manner, so that tools that use
> this information separate from the pages on which it is found have a
> standard way of conveying the information.
> * Should be possible for different parts of an event to be given in
> different parts of the page. For example, a page with calendar events
> in columns (with each row giving the time, date, place, etc) should
> still have unambiguous calendar events parseable from it.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Exposing contact details so that users can add people to their
> address books or social networking sites.
>
> SCENARIOS:
>
> * Instead of giving a colleague a business card, someone gives their
> colleague a URL, and that colleague's user agent extracts basic
> profile information such as the person's name along with references
> to
> other people that person knows and adds the information into an
> address book.
> * A scholar and teacher wants other scholars (and potentially students)
> to be able to easily extract information about who he is to add it to
> their contact databases.
> * Fred copies the names of one of his Facebook friends and pastes it
> into his OS address book; the contact information is imported
> automatically.
> * Fred copies the names of one of his Facebook friends and pastes it
> into his Webmail's address book feature; the contact information is
> imported automatically.
> * David can use the data in a web page to generate a custom browser UI
> for including a person in our address book without using brittle
> screen-scraping.
>
> REQUIREMENTS:
>
> * A user joining a new social network should be able to identify
> himself
> to the new social network in way that enables the new social network
> to bootstrap his account from existing published data (e.g. from
> another social nework) rather than having to re-enter it, without the
> new site having to coordinate (or know about) the pre-existing site,
> without the user having to give either sites credentials to the
> other,
> and without the new site finding out about relationships that the
> user
> has intentionally kept secret.
> (http://w2spconf.com/2008/papers/s3p2.pdf)
> * Data should not need to be duplicated between machine-readable and
> human-readable forms (i.e. the human-readable form should be
> machine-readable).
> * Shouldn't require the consumer to write XSLT or server-side code to
> read the contact information.
> * Machine-readable contact information shouldn't be on a separate page
> than human-readable contact information.
> * The information should be convertible into a dedicated form (RDF,
> JSON, XML, vCard) in a consistent manner, so that tools that use this
> information separate from the pages on which it is found have a
> standard way of conveying the information.
> * Should be possible for different parts of an event to be given in
> different parts of the page. For example, a page with contact details
> for people in columns (with each row giving the name, telephone
> number, etc) should still have unambiguous grouped contact details
> parseable from it.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Allow users to maintain bibliographies or otherwise keep track
> of sources of quotes or references.
>
> SCENARIOS:
>
> * Frank copies a sentence from Wikipedia and pastes it in some word
> processor: it would be great if the word processor offered to
> automatically create a bibliographic entry.
> * Patrick keeps a list of his scientific publications on his web site.
> He would like to provide structure within this publications page so
> that Frank can automatically extract this information and use it to
> cite Patrick's papers without having to transcribe the bibliographic
> information.
> * A scholar and teacher wants other scholars (and potentially students)
> to be able to easily extract information about what he has published
> to add it to their bibliographic applications.
> * A scholar and teacher wants to publish scholarly documents or content
> that includes extensive citations that readers can then automatically
> extract so that they can find them in their local university library.
> These citations may be for a wide range of different sources: an
> interview posted on YouTube, a legal opinion posted on the Supreme
> Court web site, a press release from the White House.
> * A blog, say htmlfive.net, copies content wholesale from another, say
> blog.whatwg.org (as permitted and encouraged by the license). The
> author of the original content would like the reader of the
> reproduced
> content to know the provenance of the content. The reader would like
> to find the original blog post so he can leave comments for the
> original author.
> * Chaals could improve the Opera intranet if he had a mechanism for
> identifying the original source of various parts of a page. (why?)
>
> REQUIREMENTS:
>
> * Machine-readable bibliographic information shouldn't be on a separate
> page than human-readable bibliographic information.
> * The information should be convertible into a dedicated form (RDF,
> JSON, XML, BibTex) in a consistent manner, so that tools that use
> this
> information separate from the pages on which it is found have a
> standard way of conveying the information.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Help people searching for content to find content covered by
> licenses that suit their needs.
>
> SCENARIOS:
>
> * If a user is looking for recipes of pies to reproduce on his blog, he
> might want to exclude from his results any recipes that are not
> available under a license allowing non-commercial reproduction.
> * Lucy wants to publish her papers online. She includes an abstract of
> each one in a page, but because they are under different copyright
> rules, she needs to clarify what the rules are. A harvester such as
> the Open Access project can actually collect and index some of them
> with no problem, but may not be allowed to index others. Meanwhile, a
> human finds it more useful to see the abstracts on a page than have
> to
> guess from a bunch of titles whether to look at each abstract.
> * There are mapping organisations and data producers and people who
> take
> photos, and each may place different policies. Being able to keep
> that
> policy information helps people with further mashups avoiding
> violating a policy. For example, if GreatMaps.com has a public domain
> policy on their maps, CoolFotos.org has a policy that you can use
> data
> other than images for non-commercial purposes, and Johan Ichikawa has
> a photo there of my brother's cafe, which he has licensed as "must
> pay
> money", then it would be reasonable for me to copy the map and put it
> in a brochure for the cafe, but not to copy the data and photo from
> CoolFotos. On the other hand, if I am producing a non-commercial
> guide
> to cafes in Melbourne, I can add the map and the location of the cafe
> photo, but not the photo itself.
> * At University of Mary Washington, many faculty encourage students to
> blog about their studies to encourage more discussion using an
> instance of WordPress MultiUser. A student with have a blog might be
> writing posts relevant to more than one class. Professors would like
> to then aggregate relevant posts into one blog.
> * Tara runs a video sharing web site for people who want licensing
> information to be included with their videos. When Paul wants to blog
> about a video, he can paste a fragment of HTML provided by Tara
> directly into his blog. The video is then available inline in his
> blog, along with any licensing information about the video.
> * Fred's browser can tell him what license a particular video on a site
> he is reading has been released under, and advise him on what the
> associated permissions and restrictions are (can he redistribute this
> work for commercial purposes, can he distribute a modified version of
> this work, how should he assign credit to the original author, what
> jurisdiction the license assumes, whether the license allows the work
> to be embedded into a work that uses content under various other
> licenses, etc).
>
> REQUIREMENTS:
>
> * Content on a page might be covered by a different license than other
> content on the same page.
> * When licensing a subpart of the page, existing implementations must
> not just assume that the license applies to the whole page rather
> than
> just part of it.
> * License proliferation should be discouraged.
> * License information should be able to survive from one site to
> another
> as the data is transfered.
> * Expressing copyright licensing terms should be easy for content
> creators, publishers, and redistributors to provide.
> * It should be more convenient for the users (and tools) to find and
> evaluate copyright statements and licenses than it is today.
> * Shouldn't require the consumer to write XSLT or server-side code to
> process the license information.
> * Machine-readable licensing information shouldn't be on a separate
> page
> than human-readable licensing information.
> * There should not be ambiguous legal implications.
>
> ==============================================================================
>
> Annotations
>
> USE CASE: Annotate structured data that HTML has no semantics for, and
> which nobody has annotated before, and may never again, for private use
> or
> use in a small self-contained community.
>
> SCENARIOS:
>
> * A group of users want to mark up their iguana collections so that
> they
> can write a script that collates all their collections and presents
> them in a uniform fashion.
> * A scholar and teacher wants other scholars (and potentially students)
> to be able to easily extract information about what he teaches to add
> it to their custom applications.
> * The list of specifications produced by W3C, for example, and various
> lists of translations, are produced by scraping source pages and
> outputting the result. This is brittle. It would be easier if the
> data
> was unambiguously obtainable from the source pages. This is a custom
> set of properties, specific to this community.
> * Chaals wants to make a list of the people who have translated W3C
> specifications or other documents, and then use this to search for
> people who are familiar with a given technology at least at some
> level, and happen to speak one or more languages of interest.
> * Chaals wants to have a reputation manager that can determine which of
> the many emails sent to the WHATWG list might be "more than usually
> valuable", and would like to seed this reputation manager from
> information gathered from the same source as the scraper that
> generates the W3C's TR/ page.
> * A user wants to write a script that finds the price of a book from an
> Amazon page.
> * Todd sells an HTML-based content management system, where all
> documents are processed and edited as HTML, sent from one editor to
> another, and eventually published and indexed. He would like to build
> up the editorial metadata used by the system within the HTML
> documents
> themselves, so that it is easier to manage and less likely to be
> lost.
> * Tim wants to make a knowledge base seeded from statements made in
> Spanish and English, e.g. from people writing down their thoughts
> about George W. Bush and George H.W. Bush, and has either convinced
> the people making the statements that they should use a common
> language-neutral machine-readable vocabulary to describe their
> thoughts, or has convinced some other people to come in after them
> and
> process the thoughts manually to get them into a computer-readable
> form.
>
> REQUIREMENTS:
>
> * Vocabularies can be developed in a manner that won't clash with
> future
> more widely-used vocabularies, so that those future vocabularies can
> later be used in a page making use of private vocabularies without
> making the earlier annotations ambiguous.
> * Using the data should not involve learning a plethora of new APIs,
> formats, or vocabularies (today it is possible, e.g., to get the
> price
> of an Amazon product, but it requires learning a new API; similarly
> it's possible to get information from sites consistently using
> 'class'
> values in a documented way, but doing so requires learning a new
> vocabulary).
> * Shouldn't require the consumer to write XSLT or server-side code to
> process the annotated data.
> * Machine-readable annotations shouldn't be on a separate page than
> human-readable annotations.
> * The information should be convertible into a dedicated form (RDF,
> JSON, XML) in a consistent manner, so that tools that use this
> information separate from the pages on which it is found have a
> standard way of conveying the information.
> * Should be possible for different parts of an item's data to be given
> in different parts of the page, for example two items described in
> the
> same paragraph. ("The two lamps and A and B. The first is $20, the
> second $30. The first is 5W, the second 7W.")
> * It should be possible to define globally-unique names, but the syntax
> should be optimised for a set of predefined vocabularies.
> * Adding this data to a page should be easy.
> * The syntax for adding this data should encourage the data to remain
> accurate when the page is changed.
> * The syntax should be resilient to intentional copy-and-paste
> authoring: people copying data into the page from a page that already
> has data should not have to know about any declarations far from the
> data.
> * The syntax should be resilient to unintentional copy-and-paste
> authoring: people copying markup from the page who do not know about
> these features should not inadvertently mark up their page with
> inapplicable data.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Allow authors to annotate their documents to highlight the key
> parts, e.g. as when a student highlights parts of a printed page, but in
> a
> hypertext-aware fashion.
>
> SCENARIOS:
>
> * Fred writes a page about Napoleon. He can highlight the word Napoleon
> in a way that indicates to the reader that that is a person. Fred can
> also annotate the page to indicate that Napoleon and France are
> related concepts.
>
> ==============================================================================
>
> Search
>
> USE CASE: Site owners want a way to provide enhanced search results to
> the
> engines, so that an entry in the search results page is more than just a
> bare link and snippet of text, and provides additional resources for
> users
> straight on the search page without them having to click into the page
> and
> discover those resources themselves.
>
> SCENARIOS:
>
> * For example, in response to a query for a restaurant, a search engine
> might want to have the result from yelp.com provide additional
> information, e.g. info on price, rating, and phone number, along with
> links to reviews or photos of the restaurant.
>
> REQUIREMENTS:
>
> * Information for the search engine should be on the same page as
> information that would be shown to the user if the user visited the
> page.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Search engines and other site categorisation and aggregation
> engines should be able to determine the contents of pages with more
> accuracy than today.
>
> SCENARIOS
>
> * Students and teachers should be able to discover each other -- both
> within an institution and across institutions -- via their blogging.
> * A blogger wishes to categorise his posts such that he can see them in
> the context of other posts on the same topic, including posts by
> unrelated authors (i.e. not via a pre-agreed tag or identifier, not
> via a single dedicated and preconfigured aggregator).
> * A user whose grandfather is called "Napoleon" wishes to ask Google
> the
> question "Who is Napoleon", and get as his answer a page describing
> his grandfather.
> * A user wants to ask about "Napoleon" but, instead of getting an
> answer, wants the search engine to ask him which Napoleon he wants to
> know about.
>
> REQUIREMENTS:
>
> * Should not disadvantage pages that are more useful to the user but
> that have not made any effort to help the search engine.
> * Should not be more susceptible to spamming than today's markup.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Web browsers should be able to help users find information
> related to the items discussed by the page that they are looking at.
>
> SCENARIOS:
>
> * Finding more information about a movie when looking at a page about
> the movie, when the page contains detailed data about the movie.
> * For example, where the movie is playing locally.
> * For example, what your friends thought of it.
> * Exposing music samples on a page so that a user can listen to all the
> samples.
> * Students and teachers should be able to discover each other -- both
> within an institution and across institutions -- via their blogging.
> * David can use the data in a web page to generate a custom browser UI
> for calling a phone number using our cellphone without using brittle
> screen-scraping.
>
> REQUIREMENTS:
>
> * Should be discoverable, because otherwise users will not use it, and
> thus users won't be helped.
> * Should be consistently available, because if it only works on some
> pages, users will not use it (see, for instance, the rel=next story).
> * Should be bootstrapable (rel=next failed because UAs didn't expose it
> because authors didn't use it because UAs didn't expose it).
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Finding distributed comments on audio and video media.
>
> SCENARIOS:
>
> * Sam has posted a video tutorial on how to grow tomatoes on his video
> blog. Jane uses the tutorial and would like to leave feedback to
> others that view the video regarding certain parts of the video she
> found most helpful. Since Sam has comments disabled on his blog, his
> users cannot comment on the particular sections of the video other
> than linking to it from their blog and entering the information
> there.
> Jane uses a video player that aggregates all the comments about the
> video found on the Web, and displays them as subtitles while she
> watches the video.
>
> REQUIREMENTS:
>
> * It shouldn't be possible for Jane to be exposed to spam comments.
> * The comment-aggregating video player shouldn't need to crawl the
> entire Web for each user independently.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Allow users to price-check digital media (music, TV shows, etc)
> and purchase such content without having to go through a special website
> or application to acquire it, and without particular retailers being
> selected by the content's producer or publisher.
>
> SCENARIOS:
>
> * Joe wants to sell his music, but he doesn't want to sell it through a
> specific retailer, he wants to allow the user to pick a retailer. So
> he forgoes the chance of an affiliate fee, negotiates to have his
> music available in all retail stores that his users might prefer, and
> then puts a generic link on his page that identifies the product but
> doesn't identifier a retailer. Kyle, a fan, visits his page, clicks
> the link, and Amazon charges his credit card and puts the music into
> his Amazon album downloader. Leo instead clicks on the link and is
> automatically charged by Apple, and finds later that the music is in
> his iTunes library.
> * Manu wants to go to Joe's website but check the price of the offered
> music against the various retailers that sell it, without going to
> those retailers' sites, so that he can pick the cheapest retailer.
> * David can use the data in a web page to generate a custom browser UI
> for buying a song from our favorite online music store without using
> brittle screen-scraping.
>
> REQUIREMENTS:
>
> * Should not be easily prone to clickjacking (sites shouldn't be able
> to
> charge the user without the user's consent).
> * Should not make transactions harder when the user hasn't yet picked a
> favourite retailer.
>
> ==============================================================================
>
> Cross-site communication
>
> USE CASE: Copy-and-paste should work between Web apps and native apps and
> between Web apps and other Web apps.
>
> SCENARIOS:
>
> * Fred copies an e-mail from Apple Mail into GMail, and the e-mail
> survives intact, including headers, attachments, and
> multipart/related
> parts.
> * Fred copies an e-mail from GMail into Hotmail, and the e-mail
> survives
> intact, including headers, attachments, and multipart/related parts.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Allow users to share data between sites (e.g. between an online
> store and a price comparison site).
>
> SCENARIOS
>
> * Lucy is looking for a new apartment and some items with which to
> furnish it. She browses various web pages, including apartment
> listings, furniture stores, kitchen appliances, etc. Every time she
> finds an item she likes, she points to it and transfers its details
> to
> her apartment-hunting page, where her picks can be organized, sorted,
> and categorized.
> * Lucy uses a website called TheBigMove.com to organize all aspects of
> her move, including items that she is tracking for the move. She goes
> to her "To Do" list and adds some of the items she collected during
> her visits to various Web sites, so that TheBigMove.com can handle
> the
> purchasing and delivery for her.
>
> REQUIREMENTS:
>
> * Should be discoverable, because otherwise users will not use it, and
> thus users won't be helped.
> * Should be consistently available, because if it only works on some
> pages, users will not use it (see, for instance, the rel=next story).
> * Should be bootstrapable (rel=next failed because UAs didn't expose it
> because authors didn't use it because UAs didn't expose it).
> * The information should be convertible into a dedicated form (RDF,
> JSON, XML) in a consistent manner, so that tools that use this
> information separate from the pages on which it is found have a
> standard way of conveying the information.
>
> ==============================================================================
>
> Blogging
>
> USE CASE: Remove the need for feeds to restate the content of HTML pages
> (i.e. replace Atom with HTML).
>
> SCENARIOS:
>
> * Paul maintains a blog and wishes to write his blog in such a way that
> tools can pick up his blog post tags, authors, titles, and his
> blogroll directly from his blog, so that he does not need to maintain
> a parallel version of his data in a "structured format." In other
> words, his HTML blog should be usable as its own structured feed.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Allow users to compare subjects of blog entries when the
> subjects are hard to tersely identify relative to other subjects in the
> same general area.
>
> SCENARIOS:
>
> * Paul blogs about proteins and genes. His colleagues also blog about
> proteins and genes. Proteins and genes are identified by long
> hard-to-compare strings, but Paul and his colleagues can determine if
> they are talking about the same things by having their user agent
> compare some sort of flags embedded in the blogs.
> * Rob wants to publish a large vocabulary in RDFS and/or OWL. Rob also
> wants to provide a clear, human readable description of the same
> vocabulary, that mixes the terms with descriptive text in HTML.
>
> ==============================================================================
>
> Data extraction from uncooperative sources
>
> USE CASE: Getting data out of poorly written Web pages, so that the user
> can find more information about the page's contents.
>
> SCENARIOS:
>
> * Alfred merges data from various sources in a static manner,
> generating
> a new set of data. Bob later uses this static data in conjunction
> with
> other data sets to generate yet another set of static data. Julie
> then
> visits Bob's page later, and wants to know where and when the various
> sources of data Bob used come from, so that she can evaluate its
> quality. (In this instance, Alfred and Bob are assumed to be
> uncooperative, since creating a static mashup would be an example of
> a
> poorly-written page.)
> * TV guide listings - If the TV guide provider does not render a link
> to
> IMDB, the browser should recognise TV shows and give implicit links.
> (In this instance, it is assumed that the TV guide provider is
> uncooperative, since it isn't providing the links the user wants.)
> * Students and teachers should be able to discover each other -- both
> within an institution and across institutions -- via their blogging.
> (In this instance, it is assumed that the teachers and students
> aren't
> cooperative, since they would otherwise be able to find each other by
> listing their blogs in a common directory.)
> * Tim wants to make a knowledge base seeded from statements made in
> Spanish and English, e.g. from people writing down their thoughts
> about George W. Bush and George H.W. Bush. (In this instance, it is
> assumed that the people writing the statements aren't cooperative,
> since if they were they could just add the data straight into the
> knowledge base.)
>
> REQUIREMENTS:
>
> * Does not need cooperation of the author (if the page author was
> cooperative, the page would be well-written).
> * Shouldn't require the consumer to write XSLT or server-side code to
> derive this information from the page.
>
>
> ---------------------------------------------------------------------------
>
> USE CASE: Remove the need for RDF users to restate information in online
> encyclopedias (i.e. replace DBpedia).
>
> SCENARIOS:
>
> * A user wants to have information in RDF form. The user visits
> Wikipedia, and his user agent can obtain the information without
> relying on DBpedia's interpretation of the page.
>
> REQUIREMENTS:
>
> * All the data exposed by DBpedia should be derivable from Wikipedia
> without using DBpedia.
>
> ==============================================================================
>
> --
> Ian Hickson U+1047E )\._.,--....,'``. fL
> http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
> Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
>
--
Sent from my mobile device
More information about the whatwg
mailing list