[whatwg] NoDatabase databases

Wed May 1 21:57:21 PDT 2013

Hi,

I'm not sure where to post this idea, but as it does pertain to HTML I 
thought I would post it here.

I wanted to propose (if work has not already been done in this area) 
creating an HTTP extension to allow querying for retrieval and updating 
of portions of HTML (or XML) documents where the server is so capable 
and enabled, obviating the need for a separate database (or more 
accurately, bringing the database to the web server layer).

There are three main use cases I see for this:

1) Allowing one-off queries to be made by (privileged) user agents. This 
avoids the need for websites willing to share their data to create their 
own database APIs and overhead while allowing both the client and server 
the opportunity to avoid delivering content which is not of interest to 
the user. Possible query languages might include CSS selectors, XPath, 
XQuery, or JavaScript.

2) Allowing third-party websites the ability to make such queries of 
other sites as in #1 but requiring user permission. I seem to recall 
seeing some discussions apparently reviving the possibility for 
JavaScript APIs to make cross-domain requests with user permission 
regardless of the target site giving permission.

3) The ability for user agents to allow the user to provide intelligent 
defaults for navigating a subset of potentially large data documents, 
potentially with the assistance of website mark-up, but without the need 
for website scripting. This could reduce development time and costs, 
while ensuring that powerful capabilities were enabled for users by 
default on all websites (at least those that opted in by a simple 
server-side configuration option). It could also avoid unnecessary 
demands on the server and wait-time for the client (benefiting energy 
usage, access in developing countries, wait-times anywhere for large 
documents, etc.), while conversely facilitating exposure by sites of 
large data-sets for users wishing to download a large data set. 
Web-based IDEs, moreover, could similarly allow querying and editing of 
these documents without needing to load and display the full data set 
during editing. Some concrete examples include:

     a) Allowing ordered or unordered lists or definition/dialogue lists 
or any hierarchical markup to be navigated upon user demand. The client 
and server might, for example, negotiate the number of list items from a 
list to be initially loaded and shown such that the entire list would 
not be displayed or loaded but instead would load say only the first and 
last 5 items in the list and give the user a chance to manually load the 
rest if they were interested in viewing all of that data. Hierarchical 
lists, moreover, could allow Ajax-like drill-down capabilities (or if 
the user so configured their user agent, to automatically expand to a 
certain depth), all without the author needing to provide any scripting, 
letting them focus on content. Even non-list markup, like paragraphs, 
could be drilled into, as well as providing ellipses when the child 
content was determined to be above a given memory size or if the element 
was conventionally used to provide larger amounts of data (e.g., a 
textarea). (Form submission would probably need to be disabled though 
until all child content was loaded, and again, in order to avoid usage 
against the site's intended design, such navigation might require opt-in.)

     b) Tables would merit special treatment as a hierarchical type as 
one may typically wish to ensure that all cells in a given row were 
shown by default (though even here, ellipses could be added when the 
data size was determined to be large), with pagination being the 
well-used norm of table-based widgets. Having markup specified on column 
headers (if not full-blown schemas) to indicate data types would be 
useful in this regard (markup on the top level of a list might similarly 
be useful); if the user agent were, for example, made aware of the fact 
that a table column consisted exclusively of dates, it would provide a 
search option to allow the user to display records between a given date 
range (as well as better handling sorting).

Rows could, moreover be auto-numbered by the agent with an option to 
choose a range of numbers (similarly ranges could be provided for other 
elements, like paragraph or list item numbering, etc.). The shift to the 
user agent might also encourage the ability to reorder or remove columns.

     c) Such a protocol would be enhanced by the existence of modular 
markup, akin to XInclude or XLink actuate=onload, whereby the user 
agent/user could determine whether or not to resolve child documents or 
allow the user to browse in a lite mode, selectively expanding only the 
items of interest, and content creators could easily and directly 
manipulate content files on their server desktop.

     d) Queries made with a PUT method request could allow selective 
updating of a given table row or cell, or range of rows, a set of list 
items, etc. The user agent might automatically expose this, e.g., with 
inline editing of table cells or list items, paragraphs, etc.

The web already is a database of sorts, but every attempt to query it 
requires one-off solutions.

While REST APIs may re-use HTTP methods, error codes, etc., the 
re-usability seems to fall short with request parameters. While a 
convention might be created through request parameters, given parameter 
namespacing concerns, I thought tackling these at the level of HTTP 
headers might be more appropriate. Again, I am well aware that this 
could all be handled as a kind of application, but I am looking to see 
database functionality and selective loading provided on a more 
wholesale basis, free of need for scripting.

Thanks,
Brett