[whatwg] Inconsistent behavior for empty-string URLs

Mon Dec 7 12:05:08 PST 2009

Thanks for the references, this helps my understanding a lot.

The reason I think this is important is because the "just fetch the
resource again" behavior is inherently destructive and unexpected. When
one of these appears on a page, page views double. This isn't a problem
if it's your personal blog, but for high-volume web sites such as
Yahoo!, Google, and Facebook, a 100% increase in traffic causes a lot of
problems. From conversations with engineers at other companies, it seems
that we've all fallen victim to this behavior at one time or another.

I think one would argue that <img src=""> is unlikely markup as well,
yet the spec currently provides guidance around this case. Wouldn't it
make sense to be consistent across tags that act in a similar fashion?

-Nicholas

______________________________________________
Commander Lock: "Damnit Morpheus, not everyone believes what you
believe!"
Morpheus: "My beliefs do not require them to."

-----Original Message-----
From: simetrical at gmail.com [mailto:simetrical at gmail.com] On Behalf Of
Aryeh Gregor
Sent: Monday, December 07, 2009 11:44 AM
To: Nicholas Zakas
Cc: whatwg at lists.whatwg.org
Subject: Re: [whatwg] Inconsistent behavior for empty-string URLs

On Mon, Dec 7, 2009 at 1:51 PM, Nicholas Zakas <nzakas at yahoo-inc.com>
wrote:
> Presently, HTML5 does provide guidance on the correct behavior for
<img
> src=""> in section 4.8.2, indicating that Firefox 3.5's and Opera 10's
> behavior in this regard is correct:
>
> "If the base URI of the element is the same as the document's address,
then
> the src attribute's value must not be the empty string."

That says that if it's the empty string, the document is invalid.  It
doesn't say what the UA has to do.  The relevant part is:

[[
Unless . . . the element's src attribute has a value that is an
ignored self-reference, then, when an img is created with a src
attribute, and whenever the src attribute is set subsequently, the
user agent must resolve the value of that attribute, relative to the
element, and if that is successful must then fetch that resource. . .
.

The src attribute's value is an ignored self-reference if its value is
the empty string, and the base URI of the element is the same as the
document's address.
]]

This implies user agents don't need to resolve the src or fetch the
element if the src is empty (unless the base URI is non-default).  I
don't think they're prohibited from doing so, since there's no
detectable difference to their user-visible output -- likewise they
might fetch resources speculatively even if not explicitly required
to.  It's kind of pointless, though.

The other cases seem to make no specific exception for an empty URL,
so as far as I can tell, the UA must fetch them as usual -- although
of course it might have a valid copy in the cache.

This is clearly not a good idea for <iframe>, since otherwise <iframe
src=""> is an instant infinite loop on a typical page.  The same goes
for a URL that consists only of a fragment.  In fact, a quick test in
the browsers I had handy (Firefox 3.5 and Opera 9.22) suggests that
there are more elaborate protections against recursion here.  Try
saving these two files in the same directory with the names
"test1.html" and "test2.html", and viewing test1.html in a web
browser:

<!doctype html>
<p>1</p>
<iframe src=test2.html>

<!doctype html>
<p>2</p>
<iframe src=test1.html>

Neither browser I tested with has an infinite loop here, although they
terminate at different steps: Firefox displays each page only once
(visible text is 1 2), while Opera displays test1.html twice (1 2 1).
Is this covered by the spec anywhere?

I'm not sure it makes a difference whether <script src=""></script> or
<link rel=stylesheet href=""> does anything special.  It seems simpler
to just leave them as-is, so they fetch the resource again (or
retrieve it from cache if possible) and then probably throw it out as
invalid (since it's HTML and not CSS/JS/etc.).

> I'm interested in what others' opinions on this may be, as this seems
like
> an important area in which to gain consistency.

Why?  It seems like fairly unlikely markup.  Consistency is good, but
I wouldn't call this point "important".