[whatwg] Issues concerning the <base> element and xml:base

Ian Hickson ian at hixie.ch
Thu May 17 17:55:34 PDT 2007


On Tue, 1 May 2007, Jonas Sicking wrote:
> > 
> > The latter is the option I'm following for now. Note that browsers all 
> > do _different_ things for target="" than for href="". The spec has 
> > made them act the same for now. I'm not sure this is workable, we'll 
> > have to see when the browser vendors try to get this interoperable. I 
> > can't imagine that it's a huge issue given that the browsers are so 
> > far from each other in terms of what they do here. I'm going to do a 
> > study of some subset of the Web to see how common this is (at least 
> > the static case; I can't really do much about the scripted case).
> 
> I don't think this is a good solution actually. In general, I think it's 
> good to always make the DOM reflect the behavior of the document. I.e. 
> it shouldn't matter how you arrived to a specific DOM, be it through 
> parsing of an incoming HTML stream, or by using DOM-Core calls. Whenever 
> we make an exception for that rule I think we need to have a good reason 
> for it.

I think you misread what I wrote. Right now, there's no magic involved 
here.


> What I suggest is that we make the first or last <base> element in the 
> <head> be the one that sets both the base target and the base href for 
> the document (modulo all special handling needed when <base>s appear in 
> the body, described below). While this is not what IE or Firefox does 
> today, I doubt that it'll break enough pages to stray from the 
> act-like-the-DOM-looks principal.

Right now the href="" is from the first and the target="" is from the 
last, but other than that that's what the spec says.


> Currently mozilla uses the last <base> that appears in <head>. There 
> doesn't appear to be a reason for using the last rather than the first, 
> it's just what we've always done. However it would be interesting to 
> know what IE uses here since it might matter. Did safari or opera run 
> into any issues here?

IE7 uses the first <base> only.


> One thing we unfortunately will have to deal with is <base> elements 
> appearing in the middle of the body of the document. What mozilla had to 
> do was once we find a <base> element in the body of the document, we 
> tell the parser to remember the resolved href and/or target of that 
> <base> element. We then for any element that uses base uris (full list 
> at [1]) set an internal member in the element that hardcodes the 
> elements base uri and/or base target.
>
> For elements that don't get this property set on them base href and 
> target resolution works as normal. For elements that has this set base 
> href and target resolution only uses the set properties.
> 
> Note that you only set the saved href and target in the parser if the 
> attribute is set in the <base> element. So if a document contains <base 
> target="foo"> in the middle of the body that does not set a saved href 
> in the parser.

This is deep magic, as far as the DOM goes. It also makes it hard to debug 
-- e.g. dynamically modifiying <base> elements, moving them, etc, has no 
effect anymore.


> This algorithm is something we had to add to firefox in order to support 
> many pages out there. I think IE7 changed how they delt with this, 
> though I don't know the specifics of how it changed. Would be 
> interesting to get their feedback on this.

IE7 has given up supporting multiple <base href=""> elements. They still 
support multiple <base target=""> elements though.

This breaks some sites, e.g. IE7 and the current HTML5 spec would handle 
this page differently than Firefox:

   http://www.samidoon.com/index.php?page=forums

The sidebar on the left of that page gives 404s in IE7. Or:

   http://n2ch.lazy8.info/headline/headline.cgi?mode=category&group=test

The link next to "489" "(2007.5.14) New" gives a 404 in IE7.

An even worse example is:

   http://pandorakids.net/headline/index.php?page=all

...where the images at the top don't display in IE7.

HOWEVER, having said that, this is a tiny minority of pages. According to 
a study I did of over 100,000,000 pages, 0.036% of pages have more than 
one <base href=""> element (ignoring those that specify the same href="" 
value more than once).

With <base href="">, you can get 404s, but in practice IE7 is already 
doing that, and it doesn't seem to have affected adoption. Anecdotely, 
most of these pages use absolute URIs, which might explain it.

0.06% of pages have more than one <base target=""> element (again ignoring 
duplicates). With <base target="">, the worst that can happen from the 
user's point of view is that links will open in a new page instead of on 
the same page, and in practice even that's not likely, since (anecdotely) 
most pages with <base target=""> simply alternate between different names.


What do you think?


> > No, the <base> element sets the "document entity's base URI", and is 
> > resolved relative to the "base URI from the encapsulating entity" or 
> > the "URI used to retrieve the entity". See RFC2396.
> 
> Ah, the "base" part of "base URI from the encapsulating entity" confused 
> me. Any chance we can remove that or is that the language RFC2396 uses?

It's RFC2396 language, sadly.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



More information about the whatwg mailing list