[whatwg] NoDatabase databases

Sun Aug 18 21:43:44 PDT 2013

On 8/17/2013 5:16 AM, Brendan Long wrote:
> On 05/01/2013 10:57 PM, Brett Zamir wrote:
>> I wanted to propose (if work has not already been done in this area)
>> creating an HTTP extension to allow querying for retrieval and
>> updating of portions of HTML (or XML) documents where the server is so
>> capable and enabled, obviating the need for a separate database (or
>> more accurately, bringing the database to the web server layer).
> Can't you use JavaScript to do this already? Just put each part of the
> page in a separate HTML or XML files, then have JavaScript request the
> parts it needs and put insert them into the DOM as needed.

Yes, one can, but:

1. It won't allow users to have their browser (or privileged add-on 
code) make such universal, cross-domain partial-document-obtaining 
requests to any webpage they wish (at least to any webpage which is on a 
server where a drop-in server module or script aware of this standard 
protocol had been employed).

Imagine, for example, if all a government had to do to release their 
data online was to save a Word doc, Excel file, Access database, etc. as 
HTML and FTP it to a publicly-accessible directory on their server (and 
add a server module aware of the HTML Query API which intercepts such 
queries sent to files in their public directory to handle XPath/CSS 
Selector query processing and send back CORS headers with the modified 
response). Bam, there is now a genuine, queryable database on the Web 
which is available to the world for querying.

One could obtain subsets of such data stores without the document owner 
(in this case, the government) needing to go through hoops to ensure 
their documents/data are converted into JSON/XML/etc., have custom REST 
APIs provided, has a search interface created, etc. (though this 
protocol would let people store their data in a JSON database, etc. if 
they wished, but they could also just upload static HTML files).

Consumers of this data (whether web developers or users of the 
browser/add-on concept mentioned above) would have no need to do 
inefficient screen scraping which first had to grab entire documents to 
be able to extract useful data. There would be no need for 
server-side-only solutions (at least if one is coming from a privileged 
environment such as a browser/add-on, if the document owner enabled CORS 
on their server, or if the site is one's own).

2. Such JavaScript solutions as you mention are custom and require 
developers to learn different client-side (and server-side) libraries 
and learn different server APIs. With a standard HTML Query API, one 
would need know nothing more than the URL of the data store (and the 
structure of the contents one was seeking) to get away with bare 
XMLHttpRequest (or $.ajax) calls that do what one wants against the data 
store--no need to know what specific query strings to add to meet the 
requirements of a custom server-side API. (In some cases, that may 
admittedly be more convenient to have a succinct query syntax optimized 
for the specific document format, but it is nice to always have the 
generic query option.)

3. Custom JavaScript requires sites to include such code in every file 
and to write scripts. Of course, SOME data necessitates customized 
access control such as a website's user database (though even here, one 
could use http://en.wikipedia.org/wiki/Basic_access_authentication to 
avoid scripting).

But even with scripts determining access control, many sites could still 
benefit, by being able to say create, upload, and manage a Word document 
saved as HTML with a table whose (WYSIWYG) columns were "user" and 
"password" and then, as per #2 above, use a single reusable server-side 
library implementing the standard to query this document. The site 
could, if they wished, later switch to importing their document into a 
database while still keeping the HTML Query API library calls. And if a 
server-side script wanted to say let authenticated administrators query 
or alter the user table, client-side JavaScript could be output to them 
by the server-side code which conducted queries against the user table 
in the same familiar manner.

4. If markup would be added to HTML which coordinated intelligently with 
this query scheme, say for example to allow querying of documents with 
known paragraph numbering (there are more interesting and frequently 
needed use cases than this with tables and lists as I'm planning to 
explain in my response to Ian, but I'll use a simpler example in this 
response)...

a. The document creator could create:

b. an intermediary server plugin would detect the "paragraphRange" 
attribute and then auto-strip out all of the inner paragraphs before 
delivering the document to the user (unless say other markup were 
present on <article> such as `showRange="1-20"` in which case it would 
only strip out paragraphs 21-500, or if `paragraphsPerPage="5"` were set 
(without showRange), it would strip out pars. 6-500).

c. the browser when it received the document could then recognize the 
"paragraphRange" attribute to know that it should add its own search 
interface widget at this point in the document which might contain:
     1. A generic browser-localized label, e.g. "Choose a range of 
paragraphs"
     2. Two numeric text boxes to allow the user to request a paragraph 
range, e.g., 23-45 from the server
     3. A "get all paragraphs" button or link (as an alternative to the 
range) to obtain all paragraphs beneath the widget (or "get the 
remaining paragraphs" had the "showRange" attribute been used).
     4. If the "paragraphsPerPage" attribute were present, the browser 
could also add a link to "get the next 5 paragraphs"

Although custom scripts could do this, it requires the markup creator, 
including those on any public content creation sites such as wikis, 
blogs, and discussion forums, to include such a custom script as well.

Even if the WhatWG did not wish to engage in adopting such specific 
markup conventions until seeing experience gained and demand for these 
widgets assessed, having an official HTML Query Language by which widget 
creators could pass information back-and-forth between client and server 
in a uniform manner would still facilitate the development process as 
per #2 above.