[whatwg] Issues concerning the <base> element and xml:base
ian at hixie.ch
Tue Aug 7 02:32:28 PDT 2007
On Wed, 30 May 2007, Jonas Sicking wrote:
> > > >
> > > > The latter is the option I'm following for now. Note that browsers
> > > > all do _different_ things for target="" than for href="". The spec
> > > > has made them act the same for now. I'm not sure this is workable,
> > > > we'll have to see when the browser vendors try to get this
> > > > interoperable. I can't imagine that it's a huge issue given that
> > > > the browsers are so far from each other in terms of what they do
> > > > here. I'm going to do a study of some subset of the Web to see how
> > > > common this is (at least the static case; I can't really do much
> > > > about the scripted case).
> > >
> > > I don't think this is a good solution actually. In general, I think
> > > it's good to always make the DOM reflect the behavior of the
> > > document. I.e. it shouldn't matter how you arrived to a specific
> > > DOM, be it through parsing of an incoming HTML stream, or by using
> > > DOM-Core calls. Whenever we make an exception for that rule I think
> > > we need to have a good reason for it.
> > I think you misread what I wrote. Right now, there's no magic involved
> > here.
> When you said "the latter is the option I'm following for now" I thought
> you referred to "and Firefox and IE7/Win don't change any links". Is
> that not the case?
> Looking at the spec it doesn't mention anything special regarding DOM
> mutations at all, so that would indeed make me think that links are
> changed if a <base> element is inserted at the top of the <head> using
> the DOM.
I'm not sure what I meant by "latter" anymore. Indeed, in the dynamic
case, all relative links get reresolved when the first <base href="">
> > > What I suggest is that we make the first or last <base> element in
> > > the <head> be the one that sets both the base target and the base
> > > href for the document (modulo all special handling needed when
> > > <base>s appear in the body, described below). While this is not what
> > > IE or Firefox does today, I doubt that it'll break enough pages to
> > > stray from the act-like-the-DOM-looks principal.
> > Right now the href="" is from the first and the target="" is from the
> > last, but other than that that's what the spec says.
> Why is the fact that the last target is the one used only defined in a
> Note? Or am I missing it somewhere else?
It's defined in the "Following hyperlinks" section.
> Also, if we're going to be inconsistent in how current browsers and web
> pages handle multiple <base>s, why not simply use the first <base> for
> both href="" and target=""?
> > HOWEVER, having said that, this is a tiny minority of pages. According
> > to a study I did of over 100,000,000 pages, 0.036% of pages have more
> > than one <base href=""> element (ignoring those that specify the same
> > href="" value more than once).
> > With <base href="">, you can get 404s, but in practice IE7 is already
> > doing that, and it doesn't seem to have affected adoption. Anecdotely,
> > most of these pages use absolute URIs, which might explain it.
> It's much easier for IE to get away with breaking pages, mostly because
> many people use IE as the yard-stick.
Sure. But once IE has broken the pages, it's easier for everyone else to
> > 0.06% of pages have more than one <base target=""> element (again
> > ignoring duplicates). With <base target="">, the worst that can happen
> > from the user's point of view is that links will open in a new page
> > instead of on the same page, and in practice even that's not likely,
> > since (anecdotely) most pages with <base target=""> simply alternate
> > between different names.
> > What do you think?
> I would be hesitant to drop support for multiple <base>s in firefox
> actually. Implementation wise it was very easy to implement, and it is
> known that many pages out there break, though the percentage is small,
> there are a lot of pages on the internet.
> It might be something we could restrict to quirks mode pages though,
> that's not a bad idea at all.
Right now the spec requires no multiple-base support in all modes.
As a datapoint -- several of the pages I had found as being users of
multiple <base> elements back in May are now using only one <base>.
This is how it stood back in May (using a sample of several hundred
thousand pages taken mostly from the more popular sites); number of unique
URIs in <base href> attributes as a percentage of all pages parsed:
This is how it stands as of today (using the same sampling method):
(All numbers rounded to three significant figures.) So it seems the trend
is in the "right" direction. Also, note that most of the pages I examined
personally back in May (sampled from the pages that contribute to the
above statistics, biased towards pages that use <base href> the most and
that have the most relative URIs in links and images) were either pages
that seemed unlikely to be widely used (e.g. spam pages, or aggregates of
thousands of links with no context) or were pages where the important
links were all absolute URIs and the relative URIs were site-local. Not
that I'm saying this means much, as I didn't look at that many pages, but
I didn't find any important pages that would be broken by this.
I'd rather not have to add this as a quirk-mode-depenedent feature in the
spec -- in fact I'd rather not have to add multiple <base> support at all,
as it requires a lot of magic and many things have to change (e.g.
following hyperlinks has to be aware of the magic, <img>, <object>,
<embed>, <video>, etc, have to be away of the magic, etc). I'd be
interested in what other vendors think about this. Microsoft presumably
don't like having to support multiple <base href>s, but what about
multiple <base target>? What about other vendors?
This might be something we'll have to revisit near the end of the HTML5
development cycle, to see where the Web content has settled and to see
what browser vendors feel is required to support Web content.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg