[whatwg] [rest-discuss] HTML5 and RESTful HTTP in browsers

Tue Nov 18 04:52:04 PST 2008

Hallvord R M Steen wrote:
>>> Sorry, both as an author and as a user I'd prefer this:
>>> <a href="http://example.com/report">html report</a>
>>> <a href="http://example.com/report.pdf">pdf report</a>
>>> <a href="http://example.com/report.xhtml">xml report</a>
>>>
>>> - Keep It Simple. For me as an author it's less typing, and for me as
>>> a computer-literate end user it's clear whether a link is going to
>>> make me wait for Acrobat loading or open directly - even if the link
>>> is taken out of the HTML context.
>>>       
>> "It's less typing" - Is that serious or are you joking?!
>>     
>
> Isn't it? :)
>   

Well sure, but I still don't know if that was a joke or whether it was a 
serious point!

>   
>> I disagree; it's no more clear to end users. There is no reason the status
>> bar at the bottom couldn't say
>>
>> http://example.com/report (PDF Document)
>>
>> Trivial addition for browsers to take this information from the Accept
>> attribute.
>>     
>
> Not quite "trivial", since browsers to do what you ask would need to
> maintain a table of "pretty" names for all MIME types - including
> translating that table to all languages the UI is translated to...
>   

They could support the most common, the rest could be output 
"http://example.com/report (application/foo)". This problem is no better 
addressed using a URL.

> On a more serious note: content negotiation is meant to automatically
> choose a variant of a resource (format, language). However, in many
> cases the variant is significant in a way that I as a user want
> control over. The language and even format of a resource is actually
> often essential to that resource's identity. (The whole
> content-negotiation idea is based on that statement being false. I
> believe it's true.).
>   

Language is a separate issue from content type. I would consider a 
translated document as a separate resource which should be indicated in 
the URL. The same document provided in different formats is one resource 
and multiple representations. Representations are distinct from 
resources and therefore don't fit into the definition of Uniform 
Resource Locator.

> I've built two-three websites that use content/language negotiation
> and I now consider it an architectural mistake to rely on negotiation
> because the URLs no longer uniquely identify the variants I in many
> scenarios need to identify. It's OK-ish to do it as a pure format
> choice where the server and UA just agree on using the PNG or GIF
> version for an <IMG> tag. For links *users* (and FWIW search engines,
> validators and other agents) may interact with it's however a big
> mistake to move away from one URL per variant of a resource. In light
> of my content negotiation experiments and experience I'd say an Access
> attribute in HTML would be harmful to the usability of URLs.
>   

If your system is designed to provide several different content types 
for users to be able to read/write/update/delete a single resource from 
various different User Agents - it's totally misleading to provide a 
separate URL for each of them. Because:

If I update the information in /report.pdf - does that update the 
information in /report.html and /report.xml ? They're seperate resources 
(indicated by seperate URLs) so from a 'usability' point of view, the 
expected result should be to *only* update that the information in 
/report.pdf.. but that's not actually the case here since the system 
will update the information in the other two URLs aswell.

This kind of behavior *breaks caching*, for obvious reasons.

> As a URL user (web browsing human, HTML author, linker, bookmarker,
> E-mail-with-links author) I often want to be sure about what variant
> of a resource I link to. To be explicit about this across scenarios
> requires explicit URLs with language and type information.
>
>   
>> If you put .pdf at the end a URL the server wont necessarily
>> respond with a PDF content type, so any extra certainty you feel from that
>> is artificial.
>>     
>
> File types are all about convention. It's useful when sites follow the
> convention, and it's a surprise in the rare event when they don't.
> Since most of the time they do it's more useful than harmful.
>
>   

What is the value of that 'convention'?

It only exists because of the present insufficiencies of HTML to provide 
browsers with a way to leverage protocol level conneg.

>>> Content negotiation is a lot nicer in theory than in practise..
>>>
>>>       
>> Well it's not nice in practice because HTML is currently flawed
>>     
>
> After thinking about it I've concluded that it's not nice in practice
> because the basic premise of content negotiation is fundamentally
> flawed, namely that what variant of a resource users get from a URL is
> insignificant and what's best for them can be determined
> automatically.
>
>   

Automatically is the case for user agents that interpret specific 
content types; i.e. pdf readers would only Accept application/pdf documents.

Browsers are a specific type of user agent that are primarily used for 
reading hypermedia; but also, importantly, Accept many other different 
content types which they can pass to the operating system if they don't 
support the file type returned by the server.

>> The apparent resistance to this confuses me; since the solution is not
>> complicated to implement, completely backwards compatible, and ignorable.
>>     
>
> My scepticism has nothing to do with whether it's easy to implement
> (though I think you underestimate the required efforts - for example
> the UA would need to verify that provided Accept: values are
> correct/don't cause security problems etc.). My scepticism has nothing
> to do with whether it is backwards compatible either. As a URL user I
> just want to defend the usability of URLs against a theoretically more
> pure but for practical purposes deeply flawed solution to a
> non-problem.
>
>   

It's interesting you mention security actually. Right now - as it stands 
- your web browser is sending all of its requests  with an Accept header 
that contains a catch-all "*/*". That is significantly less secure - the 
fact that you see .pdf at the end of the URL doesn't mean my server 
isn't about to send you an executable. This is what I was referring to 
as "artificial certainty".

Regards,
Mike