[whatwg] Issues concerning the <base> element and xml:base
jonas at sicking.cc
Tue May 1 17:08:45 PDT 2007
Ian Hickson wrote:
> On Sun, 11 Feb 2007, Geoffrey Sneddon wrote:
>> Safari 2.0.4/419.3: (1) Inserted in DOM (in the innerHTML location).
>> Firefox 22.214.171.124: (3) Inserted in DOM (in the innerHTML location).
>> IE/Mac 5.2.3: (2) (anyway to view the DOM tree?)
>> Opera 9.10: (1) DOM Snapshot for some reason isn't working.
>> IE6/Win: (2) The new <base> never appears in DOM, but the full absolute URLs
>> are in the DOM.
>> IE7/Win: (3) The new <base> never appears in DOM, but the full absolute URLs
>> are in the DOM.
>> In conclusion, Safari and Opera change all the links, IE5/Mac and
>> IE6/Win both change links within the fragment, and Firefox and IE7/Win
>> don't change any links.
> The latter is the option I'm following for now. Note that browsers all do
> _different_ things for target="" than for href="". The spec has made them
> act the same for now. I'm not sure this is workable, we'll have to see
> when the browser vendors try to get this interoperable. I can't imagine
> that it's a huge issue given that the browsers are so far from each other
> in terms of what they do here. I'm going to do a study of some subset of
> the Web to see how common this is (at least the static case; I can't
> really do much about the scripted case).
I don't think this is a good solution actually. In general, I think it's
good to always make the DOM reflect the behavior of the document. I.e.
it shouldn't matter how you arrived to a specific DOM, be it through
parsing of an incoming HTML stream, or by using DOM-Core calls. Whenever
we make an exception for that rule I think we need to have a good reason
For quirky <base> behavior it is my experience that what matters most is
what URI things in a static page is resolved against. Most modern pages
that uses scripting and DOM and such usually only has zero or one <base>
element that lives in the head.
What I suggest is that we make the first or last <base> element in the
<head> be the one that sets both the base target and the base href for
the document (modulo all special handling needed when <base>s appear in
the body, described below). While this is not what IE or Firefox does
today, I doubt that it'll break enough pages to stray from the
Currently mozilla uses the last <base> that appears in <head>. There
doesn't appear to be a reason for using the last rather than the first,
it's just what we've always done. However it would be interesting to
know what IE uses here since it might matter. Did safari or opera run
into any issues here?
One thing we unfortunately will have to deal with is <base> elements
appearing in the middle of the body of the document. What mozilla had to
do was once we find a <base> element in the body of the document, we
tell the parser to remember the resolved href and/or target of that
<base> element. We then for any element that uses base uris (full list
at ) set an internal member in the element that hardcodes the
elements base uri and/or base target.
For elements that don't get this property set on them base href and
target resolution works as normal. For elements that has this set base
href and target resolution only uses the set properties.
Note that you only set the saved href and target in the parser if the
attribute is set in the <base> element. So if a document contains <base
target="foo"> in the middle of the body that does not set a saved href
in the parser.
This algorithm is something we had to add to firefox in order to support
many pages out there. I think IE7 changed how they delt with this,
though I don't know the specifics of how it changed. Would be
interesting to get their feedback on this.
> On Tue, 10 Apr 2007, Jonas Sicking wrote:
>> Note that the current text isn't implementable since it says that
>> relative uris in <base> should be resolved against the base uri
>> document, but the <base> element modifies that base uri so there is a
>> circular dependency.
> No, the <base> element sets the "document entity's base URI", and is
> resolved relative to the "base URI from the encapsulating entity" or the
> "URI used to retrieve the entity". See RFC2396.
Ah, the "base" part of "base URI from the encapsulating entity" confused
me. Any chance we can remove that or is that the language RFC2396 uses?
More information about the whatwg