[whatwg] NoDatabase databases
brettz9 at yahoo.com
Sun Aug 18 21:43:44 PDT 2013
On 8/17/2013 5:16 AM, Brendan Long wrote:
> On 05/01/2013 10:57 PM, Brett Zamir wrote:
>> I wanted to propose (if work has not already been done in this area)
>> creating an HTTP extension to allow querying for retrieval and
>> updating of portions of HTML (or XML) documents where the server is so
>> capable and enabled, obviating the need for a separate database (or
>> more accurately, bringing the database to the web server layer).
> parts it needs and put insert them into the DOM as needed.
Yes, one can, but:
1. It won't allow users to have their browser (or privileged add-on
code) make such universal, cross-domain partial-document-obtaining
requests to any webpage they wish (at least to any webpage which is on a
server where a drop-in server module or script aware of this standard
protocol had been employed).
Imagine, for example, if all a government had to do to release their
data online was to save a Word doc, Excel file, Access database, etc. as
HTML and FTP it to a publicly-accessible directory on their server (and
add a server module aware of the HTML Query API which intercepts such
queries sent to files in their public directory to handle XPath/CSS
Selector query processing and send back CORS headers with the modified
response). Bam, there is now a genuine, queryable database on the Web
which is available to the world for querying.
One could obtain subsets of such data stores without the document owner
(in this case, the government) needing to go through hoops to ensure
their documents/data are converted into JSON/XML/etc., have custom REST
APIs provided, has a search interface created, etc. (though this
protocol would let people store their data in a JSON database, etc. if
they wished, but they could also just upload static HTML files).
Consumers of this data (whether web developers or users of the
browser/add-on concept mentioned above) would have no need to do
inefficient screen scraping which first had to grab entire documents to
be able to extract useful data. There would be no need for
server-side-only solutions (at least if one is coming from a privileged
environment such as a browser/add-on, if the document owner enabled CORS
on their server, or if the site is one's own).
developers to learn different client-side (and server-side) libraries
and learn different server APIs. With a standard HTML Query API, one
would need know nothing more than the URL of the data store (and the
structure of the contents one was seeking) to get away with bare
XMLHttpRequest (or $.ajax) calls that do what one wants against the data
store--no need to know what specific query strings to add to meet the
requirements of a custom server-side API. (In some cases, that may
admittedly be more convenient to have a succinct query syntax optimized
for the specific document format, but it is nice to always have the
generic query option.)
and to write scripts. Of course, SOME data necessitates customized
access control such as a website's user database (though even here, one
could use http://en.wikipedia.org/wiki/Basic_access_authentication to
But even with scripts determining access control, many sites could still
benefit, by being able to say create, upload, and manage a Word document
saved as HTML with a table whose (WYSIWYG) columns were "user" and
"password" and then, as per #2 above, use a single reusable server-side
library implementing the standard to query this document. The site
could, if they wished, later switch to importing their document into a
database while still keeping the HTML Query API library calls. And if a
server-side script wanted to say let authenticated administrators query
by the server-side code which conducted queries against the user table
in the same familiar manner.
4. If markup would be added to HTML which coordinated intelligently with
this query scheme, say for example to allow querying of documents with
known paragraph numbering (there are more interesting and frequently
needed use cases than this with tables and lists as I'm planning to
explain in my response to Ian, but I'll use a simpler example in this
a. The document creator could create:
<p>This is par. 1</p>
<p>This is par. 2</p>
<p>This is par. 500</p>
b. an intermediary server plugin would detect the "paragraphRange"
attribute and then auto-strip out all of the inner paragraphs before
delivering the document to the user (unless say other markup were
present on <article> such as `showRange="1-20"` in which case it would
only strip out paragraphs 21-500, or if `paragraphsPerPage="5"` were set
(without showRange), it would strip out pars. 6-500).
c. the browser when it received the document could then recognize the
"paragraphRange" attribute to know that it should add its own search
interface widget at this point in the document which might contain:
1. A generic browser-localized label, e.g. "Choose a range of
2. Two numeric text boxes to allow the user to request a paragraph
range, e.g., 23-45 from the server
3. A "get all paragraphs" button or link (as an alternative to the
range) to obtain all paragraphs beneath the widget (or "get the
remaining paragraphs" had the "showRange" attribute been used).
4. If the "paragraphsPerPage" attribute were present, the browser
could also add a link to "get the next 5 paragraphs"
Although custom scripts could do this, it requires the markup creator,
including those on any public content creation sites such as wikis,
blogs, and discussion forums, to include such a custom script as well.
Even if the WhatWG did not wish to engage in adopting such specific
markup conventions until seeing experience gained and demand for these
widgets assessed, having an official HTML Query Language by which widget
creators could pass information back-and-forth between client and server
in a uniform manner would still facilitate the development process as
per #2 above.
More information about the whatwg