[whatwg] [rest-discuss] HTML5 and RESTful HTTP in browsers

Thu Dec 25 04:11:39 PST 2008

On Mon, 17 Nov 2008 mike at mykanjo.co.uk wrote:
> 
> I've read that HTML5 will be providing markup for the PUT and DELETE 
> methods. This is definitely good news - but I considered something else 
> recently that, from what I can gather, is not in the current spec for 
> HTML5; markup for specifying appropriate Accept headers for requests.

What problem would such a feature solve?

> I brought this up recently in #whatwg on freenode, and I was informed 
> that this is not currently being considered since the equivalent can be 
> achieved by a URL parameter such as '?type=application/xml'. Many would 
> not Accept (pun intended - sorry) that this method was significantly 
> different, some even went as far as to suggest (disturbingly) that 
> serving multiple content-types from the same URI is undesirable!

Indeed, content negotiation on the Web has not been a particularly roaring 
success, and it would probably have been better if we had avoided 
intoducing it, but that's an issue for another working group (and another 
era, probably -- we're likely stuck with it now).

On Mon, 17 Nov 2008, Adrian Sutton wrote:
> 
> I don't see why the Accept header when following links or requesting 
> images should be controlled by anything other than the browser.  It's 
> the browser that has to decide actually render the returned content so 
> it's in the best position to decide what it can accept, not the page 
> author.

That does seem like a valid point.

On Mon, 17 Nov 2008 mike at mykanjo.co.uk wrote:
> 
> as an example:
> 
> <a href="http://example.com/report">html report</a>
> <a href="http://example.com/report" Accept="application/pdf">pdf report</a>
> <a href="http://example.com/report" Accept="application/rss+xml">xml report</a>
> 
> So I can send a colleague a message; 'you can get the report at 
> http://example.com/report', and they can use that URL in any user agent 
> that is appropriate. A browser is a special case in which many different 
> content-types are dealt with. The same benefit is not achieved if the 
> content is negotiated via the URL, since the user would have to know the 
> type their user agent required and modify the URL accordingly:
> 
> example.com/report?type=application/rss+xml
> 
> To me, this is a much cleaner and more appropriate use of a URI. Not to 
> mention more user-friendly. Something, I believe should be encouraged - 
> this is why I feel it would be an important addition to HTML5.

People do this today:

   <a href="http://example.com/report.html">html report</a>
   <a href="http://example.com/report.pdf">pdf report</a>
   <a href="http://example.com/report.xml">xml report</a>

...with the e-mail just saying:   

   http://example.com/report

...and Apache's content-negotiation module working out the best file to 
return.

This works today, what's the problem with it? (Other than theoretical 
purity concerns, which have been argued both ways here and are thus not a 
useful criteria to evaluate solutions by.)

On Mon, 17 Nov 2008, Adrian Sutton wrote:
> 
> The reason this is basically never used today is two fold:
> 1. It requires correctly configuring the server, beyond just putting files
> on the file system.  Very few people actually do this.
> 2. It requires the user to see a URL and decide that they want to paste it
> into Acrobat instead of their browser, without any indication that it would
> actually work.

Indeed. Content negotiation is not really compatible with the mental model 
people have of URLs (which is more similar to their model of files than to 
the model that URIs really represent).

On Mon, 17 Nov 2008, Smylers wrote:
> mike at mykanjo.co.uk writes:
> > 
> > So I can send a colleague a message; 'you can get the report at 
> > http://example.com/report', and they can use that URL in any user 
> > agent that is appropriate.
> 
> Except that in practice on receiving a URL like the above, nearly all 
> users will try it in a web browser; they are unlikely to put it into 
> their PDF viewer, in the hope that a PDF version of the report will 
> happen to be available.

Indeed.

> > A browser is a special case in which many different content-types are 
> > dealt with.
> 
> It's also the most common case.  Supposing I opened the above URL in a 
> browser, and it gave me the HTML version; how would I even know that the 
> PDF version exists?
> 
> Suppose my browser has a PDF plug-in so can render either the HTML or 
> PDF versions, it's harder to bookmark a particular version because the 
> URL is no longer sufficient to identify precisely what I was viewing. 
> Browsers could update the way bookmarks work to deal with this, but any 
> exterrnal (such as web-based) bookmarking tools would also need to 
> change.
> 
> Or suppose the HTML version links to the PDF version.  I wish to 
> download the PDF on a remote server, and happen to have an SSH session 
> open to it.  So I right-click on the link in the HTML version I'm 
> looking at, choose 'Copy Link Location' from the menu, and in the remote 
> shell type wget then paste in the copied link.  If the link explicitly 
> has ?type=PDF in the URL, I get what I want; if the format is specified 
> out of the URL then I've just downloaded the wrong thing.

Indeed, it does seem that the usability of content-negotiated resources is 
less good thant the usability of distinct resources.

On Mon, 17 Nov 2008, Hallvord R M Steen wrote:
> 
> On the other hand, I'd sort of like
> 
> <a href="http://example.com/report" AcceptLanguage="no">Norwegian</a>
> <a href="http://example.com/report" AcceptLanguage="en">English</a>
> 
> As the main problem with using content-negotiation for language right 
> now is that you need to hard-link to actual files (i.e. file.en.html) to 
> give users a way to "override" the negotiation on the fly. (No, nobody 
> will reconfigure their browser to use your site and everyone must be 
> given a choice of language even if they can't control the settings of 
> the browser they happen to use.) It's not good enough though, since one 
> would like the language choice to "stick" automatically - you still need 
> to fall back to cookies and a custom script for handling language choice 
> or "no suitable version" errors.
> 
> Content negotiation is a lot nicer in theory than in practise..

Indeed.

On Mon, 17 Nov 2008, Mike Kelly wrote:
> 
> I disagree; it's no more clear to end users. There is no reason the 
> status bar at the bottom couldn't say
> 
> http://example.com/report (PDF Document)

Personally it seems that distinct URLs pointing to distinct 
representations is more usable, but if we disagree on this then we'll need 
to do studies to determine it one way or the other.

> Trivial addition for browsers to take this information from the Accept 
> attribute. If you put .pdf at the end a URL the server wont necessarily 
> respond with a PDF content type, so any extra certainty you feel from 
> that is artificial. It should be up for interpretation whether you chose 
> to do it this way or not. At the moment HTML is not providing a way to 
> take advantage of HTTP conneg, that's not very fair - particularly given 
> the criticism that 'its not possible at the moment'. Surely a primary 
> objective here must be allowing browser developers to make full use of 
> every aspect of the HTTP protocol?

Oh, no, not at all. I don't think that's ever even remotely been a goal.

> Well it's not nice in practice because HTML is currently flawed and 
> insufficient as a way of telling browsers how to do it properly. This is 
> entirely my point; let's make HTTP conneg possible in browsers by 
> getting HTML right - and let the developers decide the best practices. 
> By not supporting this part of the HTTP protocol (content negotiation) 
> you are taking something fundamental out of the hands of application 
> developers (client and server side) because you don't think it's 
> necessary.
> 
> The apparent resistance to this confuses me; since the solution is not 
> complicated to implement, completely backwards compatible, and 
> ignorable.

The resistance is because there is a desire to keep HTML as small as 
possible, and so there is a high bar to entry for new features. We have to 
demonstrate that they are important, that they solve a major problem, and 
that browser vendors are interested in implementing them.

On Tue, 18 Nov 2008, Mike wrote:
> 
> If your system is designed to provide several different content types 
> for users to be able to read/write/update/delete a single resource from 
> various different User Agents - it's totally misleading to provide a 
> separate URL for each of them. Because:
> 
> If I update the information in /report.pdf - does that update the 
> information in /report.html and /report.xml ? They're seperate resources 
> (indicated by seperate URLs) so from a 'usability' point of view, the 
> expected result should be to *only* update that the information in 
> /report.pdf.. but that's not actually the case here since the system 
> will update the information in the other two URLs aswell.

I don't see why this would be confusing to users. They don't associate 
their data with URIs mentally, they associate them with whatever the site 
is exposing to them.

> This kind of behavior *breaks caching*, for obvious reasons.

How so? Could you elaborate? How is this different from other people 
editing one file (e.g. forums) or one editing one file causing others to 
change (e.g. blog posts syndicated to multiple pages)?

> It's interesting you mention security actually. Right now - as it stands 
> - your web browser is sending all of its requests with an Accept header 
> that contains a catch-all "*/*". That is significantly less secure - the 
> fact that you see .pdf at the end of the URL doesn't mean my server 
> isn't about to send you an executable. This is what I was referring to 
> as "artificial certainty".

Sending "Accept: application/pdf" doesn't prevent the server from sending 
you an executable either.

On Sat, 22 Nov 2008, Martin Atkins wrote:
> 
> Agreed. I think the assumptions underlying content negotation are 
> flawed, and thus the mechanism itself is flawed and causes confusion and 
> inconvenience when used in practice. The sentiment underlying this 
> proposal seems to be that HTTP content negotation would work fine if 
> only the pesky browsers would support it, but I think there are 
> deeper-rooted problems than simply a lack of browser support.

Agreed.

> I think a better solution is to publish the HTML version with attributed 
> hyperlinks, like this:
> 
> <link rel="alternate" type="application/pdf" href="document.pdf">
>
> or, if you prefer:
> 
> <a href="document.pdf" rel="alternate" type="application/pdf">
>     PDF Version
> </a>
> 
> In future, once IETF has finished specifying this, it may also be possible to
> do this in the HTTP response headers for non-HTML resources:
> 
> Link: <document.pdf>; rel="alternate", type="application/pdf"

Indeed.

> This way clients can discover the alternative representations, but the 
> alternative representations are all directly addressable so you can link 
> to a specific representation. This approach is used in practice 
> successfully today to make available Atom representations of HTML pages 
> across the web. Atom feeds are arguably the best example of a successful 
> completely-RESTful API that we have today, so this approach is proven to 
> work.

And all that without content negotiation, indeed.

[snip e-mails just going around in circles]

On Tue, 18 Nov 2008, Mike wrote:
> 
> The benefits? Oh I don't know.. a markup language that supports the 
> transfer protocol it runs on?!

Why is this a benefit?

Should we also expose TCP urgent mode data?

There are many features in our underlying protocols that we don't expose. 
Exposing the underlying protocols is not a goal (indeed if anything it's 
an anti-goal, as it can be considered an abstraction violation).

On Tue, 18 Nov 2008, Mike wrote:
> 
> So the Transfer Protocol (HTTP) and the Markup Language (HTML) for Hyper 
> Text are not closely linked?

They're developed by different working groups in different standards 
organisations with almost no overlap in membership (and as far as I no, no 
overlap in active membership at all, where "active membership" means 
people doing more than just e-mailing the lists, e.g. doing research, 
experimental implementations, editing, testing, etc).

They are in fact not at all closely linked. (They should probably be 
linked more closely than they are.)

I haven't changed the spec in response to this feedback, due to the lack 
of clear use cases. If there are specific user- or author- facing 
problems, please do raise those issues. However, exposing HTTP features is 
not a goal and thus HTTP itself having a feature does not provide a strong 
enough reason to expose that feature to HTML.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'