[whatwg] PaceEntryMediatype

Thomas Broyer t.broyer at gmail.com
Mon Dec 4 01:29:21 PST 2006


2006/12/4, Ian Hickson:
> On Sun, 3 Dec 2006, Thomas Broyer wrote:
> >
> > What I mean is that "being syndication feed" is not a property of a
> > relationship, it's a property of one end of the relationship (the
> > resource the link "starts from" or "points to"); so it has nothing to do
> > with the rel="" attribute.
>
> I agree, in principle. Unfortunately, for autodiscovery we have to have a
> mechanism that can advertise what the syndaication feeds are without
> requiring the UA to fetch every link, because fetching every link would
> be much slower (and on some networks, fiscally more expensive).

There's no need to fetch every link if you base your assumptions on
the type="" attribute (and *only* the type="" attribute, not the
combination with any special rel="" attribute value).
If you don't use the type="" attribute on <link>s, you'll have many
more requests than if you did, because of the "fetch to discover the
content-type" algorithm described for <link> elements, but that's the
author problem, and it's not limited to feed autodiscovery, so…

> > > So you're proposing making the hundreds of millions of existing
> > > instances of syndication feed links non-conforming?
> >
> > No more than they already are.
> > rel="alternate" is for linking to alternate representations, and
> > hundreds of millions of syndication feed links are not using it that
> > way; they already are non-conforming.
>
> Fair enough. They still exist, though. Browser vendors aren't going to
> stop supporting this. We would be just sticking our heads in the sand if
> we ignored this.

Many things are marked as "deprecated" in earlier HTML versions, and
are still supported by browsers.
Also, as the misuse of rel="alternate" is not machine testable, and
given that I don't propose "banning" the use of rel="alternate" for
feed autodiscovery, I can't see how a browser vendor could "stop
supporting this".

> > And note that this is something that is not machine-testable, that's why
> > those hundreds of millions of syndication feed links are not caught as
> > "invalid" by validators, as they won't be whatever HTML5 finally says.
>
> When people link to an Atom document, they are giving a syndication feed.
> I'm sure theoretically there could be other uses of Atom, but from my
> studies of Web content, I haven't seen any evidence that this is
> widespread enough to deserve special treatment.

Seems like you really didn't understand my point…
« rel="alternate" + type="RSS or Atom" means rel="feed alternate" »
*is* special treatment.
« type="RSS or Atom" means it's a syndication feed, whichever the
rel="" value » is *not* special treatment.
« rel="feed" means it's a feed –with *my* definition of the "feed"
relationship–, whichever the type="" value » is *not* special
treatment.


> > In 4.4.3.1 (Link type "alternate"), remove this paragraph:
> > """If the alternate keyword is used with the type attribute set to the
> > value application/rss+xml or the value application/atom+xml, then the
> > user agent must treat the link as it would if it had the feed keyword
> > specified as well."""
>
> Removing this paragraph breaks existing practices.

No, it doesn't.
<link rel="alternate" type="application/rss+xml" href="A"> links to a
syndication feed, not because of the rel="alternate" or its
combination with the type="application/rss+xml", but just because of
the type="application/rss+xml".
We have a problem with application/atom+xml because it can represent
either a feed or a standalone entry, but the Atom WG is working on
this issue (either we'll have a new 'type' parameter:
application/atom+xml;type=entry, or a new media type:
application/atom.entry+xml), so Atom won't be any different from RSS.
And given that I redefine rel="feed" and feed autodiscovery (see
below), the above quoted paragraph is no longer appropriate.

> > Remove rel="feed" or, if you really think it's different from
> > rel="index", define it that way:
> > """The feed keyword indicates that the referenced document is a
> > syndication feed which is or has been linking to the current page as a
> > feed item.
> > For example, in a Web log, a page representing a single entry can link
> > to the Web log homepage and/or the Web log's Atom or RSS feed using
> > using the link type feed."""
>
> There are syndication feeds that don't fit this definition.

Of course yes, and they will be discovered based on the content-type,
and rel="" will deserve its real role: describing the relationship
between the two resources (and not describing the other end of the
link).

Definition of feed: a bag of items; the representation of a feed
generally exposes only the 10, or so, latest created or updated items.
You'll note that this has nothing to do with the feed "format" (Atom,
RSS, a Web log's homepage in HTML, etc.)
If a document was once linked from a feed's representation as an item,
it is an item of this feed, even if the feed's current representation
doesn't link to it anymore. The relationship still exists. This
relationship is "I am an item of this feed" or "this is a feed within
which I once appeared". I propose representing it as rel="feed".

> For example, a home page could link to various feeds for things like
> planned outages, news, press releases, etc, not all of which might be
> on the page itself.

What do you mean by "might be on the page itself"?

Anyway, if you link to something, there's a reason. This reason is
that there is a relationship between the current document and the
thing the link points to. This relationship is described in the rel=""
attribute.
"It is *a* feed" is not a valid reason, it doesn't describe a relationship.
"This is an alternate representation of this page in a format you can
subscribe to" is a valid reason: it's an alternate representation.
"I am an item of this feed" is a valid reason: I was once linked from
it, so you'll find other similar things you might be interested in
(because they are from the same author, or about the same subject,
etc. this is to be "explained" to the user using the title=""
attribute, that's not something a "machine" has to know about).

> > If you really want to deal with feed autodiscovery (which I believe it
> > shouldn't be part of HTML5), add something like this to section 3.5.4
> > (The link element; feed autodiscovery should be limited to <link>
> > elements, and given that it's how it's done today, it causes no
> > backwards compatibility problem):
> > """For example, external resource links with the type attribute set to
> > the value application/rss+xml or the value application/atom+xml or
> > with the link type feed may be recognized as links to subscribable
> > resources for the purpose of feed autodiscovery.
> > """
>
> This is not well-defined enough. We need something far more specific than
> an example in order to foster interoperability. Also, I don't really see
> how the end result here is anything different from what the spec says,
> other than being more vague.

Well, rel="feed" as it is defined now in HTML5 *is* vague.

I propose using the diversity of rel="" values whichever the content
type (for example, from a Web log's "archive" page –e.g. "entries from
Septembre 2006"–, linking to the Web log's homepage could be done with
rel="first", or rel="last" depending on how you consider the set of
documents... There's no reason it couldn't be allowed, and detected as
a "feed link" depending on other aspects of the link, such as its
type="" value).

Is what you want an algorithm for feed autodiscovery?

for each <link> in the document:
    if @rel="feed":
        if canSubscribeTo(@type)*:
            add to list of "links to feeds"
    if isSubscribable(@type)*:
        add to list of "links to feeds"

* canSubscribeTo: <link rel="feed"> could point to an HTML page.
There's no reason some aggregators couldn't subscribe to it if it uses
HTML5's <article> or the hAtom microformat, for example. If this is
the case, canSubscribeTo would return true for "text/html". @type is
the type="" attribute if present, or the Content-Type returned by a
fetch.
 * isSubscribable: returns true if the type is any recognized
"subscribable content type": RSS  or Atom, but also
text/vnd.IPTC.NewsML for example. @type is the type="" attribute if
present, or the Content-Type returned by a fetch.

It all depends on the capabilities of the UA…

Now that you have a list of "links to feeds", you can, for example,
present them to the user. For that, you can group links based on their
rel="" value, for example in the way I did in
http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-December/008238.html
That's only an example, it shouldn't appear into the spec, I well
understood this.

I don't think there is a need to select only *one* such link, but if
you really want to, pick the first one (in document order) with
rel="alternate". If there isn't any, pick the first one with
rel="feed". If there isn't any either, pick the first one, whichever
its rel="" value.

-- 
Thomas Broyer


More information about the whatwg mailing list