[whatwg] <video> "await a stable state" in resource selection (Was: Race condition in media load algorithm)

Wed Aug 11 02:13:55 PDT 2010

CC Hixie, question below.

On Tue, 10 Aug 2010 18:39:04 +0200, Boris Zbarsky <bzbarsky at mit.edu> wrote:

> On 8/10/10 4:40 AM, Philip Jägenstedt wrote:
>> Because the parser can't create a state which the algorithm doesn't
>> handle. It always first inserts the video element, then the source
>> elements in the order they should be evaluated. The algorithm is written
>> in such a way that the overall result is the same regardless of whether
>> it is invoked/continued on each inserted source element or after the
>> video element is closed.
>
> Ah, the waiting state, etc?

Yes, in the case of the parser inserting source elements that fail one of  
the tests (no src, wrong type, wrong media) the algorithm will end up at  
step 6.21 waiting. It doesn't matter if all sources are available when the  
algorithm is first invoked or if they "trickle in", be that from the  
parser or from scripts.

> Why does the algorithm not just reevaluate any sources after the  
> newly-inserted source instead?

Because if a source failed after network access (404, wrong MIME, etc)  
then we'd have to perform that network access again and again for each  
modification. More on that below.

>> However, scripts can see the state at any point, which is why it needs  
>> to be the same in all browsers.
>
> I'm not sure which "the state" you mean here.

For example networkState can be NETWORK_NO_SOURCE, NETWORK_EMPTY or  
NETWORK_LOADING depending on which steps you've run. Silvia Pfeiffer found  
inconsistencies between browsers because of this in, see  
<http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027284.html>

It's quite serious because NETWORK_EMPTY is used as a condition in many  
places of the spec, so this absolutely must be consistent between browsers.

>>> Because changes to the set of <source> elements do not restart the
>>> resource selection algorithm, right? Why don't they, exactly? That
>>> seems broken to me, from the POV of how the rest of the DOM generally
>>> works (except as required by backward compatibility considerations)...
>>
>> The resource selection is only started once, typically when the src
>> attribute is set (by parser or script) or when the first source element
>> is inserted. If it ends up in step 21 waiting, inserting another source
>> element may cause it to continue at step 22.
>
> Right, ok.
>
>> Restarting the algorithm on any modification of source elements would
>> mean retrying sources that have previously failed due to network errors
>> or incorrect MIME type again and again, wasting network resources.
>> Instead, the algorithm just keeps it state and waits for more source
>> elements to try.
>
> Well, the problem is that it introduces hysteresis into the DOM.  Why is  
> this a smaller consideration than the other, in the edge case when  
> someone inserts sources in reverse order and "slowly" (off the event  
> loop)?

The algorithm has been very stateful since I first implemented it and I  
always considered the sync/async split to be precisely for that reason, to  
be more tolerant of the order of DOM modification. I'll have to let Hixie  
answer why this specific trade-off was made.

> That is, why do we only consider sources inserted after the |pointer|  
> instead of all newly inserted sources?

Otherwise the pointer could potentially reach the same source element  
twice, with the aforementioned problems with failing after network access.

>> I'm not sure what you mean by hysteresis
>
> http://en.wikipedia.org/wiki/Hysteresis
>
> Specifically, that the state of the page depends not only on the current  
> state of the DOM but also on the path in state space that the page took  
> to get there.
>
> Or in other words, that inserting two <source> elements does different  
> things depending on whether you do "appendChild(a); appendChild(b)" or  
> "appendChild(b); insertBefore(a, b)", even though the resulting DOM is  
> exactly the same.
>
> Or in your case, the fact that the ordering of the setAttribute and  
> insertChild calls matters, say.
>
> Such situations, which introduce order-dependency on DOM operations, are  
> wonderful sources of frustration for web developers, especially if  
> libraries that abstract away the DOM manipulation are involved (so the  
> web developer can't even change the operation order).

OK, perhaps I should take this more seriously. Making the whole algorithm  
synchronous probably isn't a brilliant idea unless we can also do away  
with all of the state it keeps (i.e. hysteresis).

One way would be to introduce a magic flag on all source elements to  
indicate that they have already failed. This would be cleared whenever  
src, type or media is modified. Another is to cache 404 responses and the  
MIME types of rejected resources, but I think that's a bit overkill. Do  
you have any specific ideas?

>>> I have a really hard time believing that you trigger resource
>>> selection when the <video> is inserted into the document and don't
>>> retrigger it afterward, given that... do you?

To the best of my knowledge we do exactly what the spec says, apart from  
the uncertainty regarding "await a stable state".

Resource selection is triggered by setting/modifying the src attribute or  
inserting a source element when networkState is NETWORK_EMPTY. Here's an  
annotated guide of exactly what happens in two cases:

<video src="video.webm">
<!-- resource selection triggered as src attribute was set by parser -->
</video>

<video>
<!-- resource selection not triggered yet -->
<source>
<!-- resource selection triggered, ends up waiting in step 6.21 due to  
missing src -->
<source src="video.mp4" type="video/mp4">
<!-- resource selection continues at step 6.22, but ends up waiting again  
in 6.21 as we don't support video/mp4 -->
<source src="video.webm" type="video/webm">
<!-- resource selection continues at step 6.22, calling resource fetch in  
step 6.9, potentially never returning -->
</video>

>>>> 2. Instead of calling the resource fetch algorithm in step 5/9
>>>
>>> There doesn't seem to be such a step...
>>>
>>>> 3. In step 21, instead of waiting forever, just return and let  
>>>> inserting
>>>> a source element cause it to continue at step 22.
>>>
>>> Again, the numbering seems to be off.
>>
>> These are steps in the resource selection algorithm, not in the resource
>> fetch algorithm.
>
> Yes.  Step 5 in the resource selection algorithm I see is:
>
>    5. Queue a task to fire a simple event named loadstart at the media
>       element.
>
> It has no substeps.

Oops, steps 5/9/21 are substeps of step 6.

>> Mozilla is implementing this now. How are you interpreting "await a
>> stable state" when the resource selection algorithm is triggered by the
>> parser?
>
> At the moment, given that we don't differentiate betwen "pause" and  
> "spin the event loop" internally, it sounds like we plan to treat tis as  
> "wait until the next event runs from the event loop".  This means we  
> will treat an alert being up as being in a stable state; same for sync  
> XHR, showModalDialog, etc.  From the parser we will basically treat it  
> as "run asynchronously".
>
>> Will the result be 100% predictable or depend on "random" things
>> like how much data the parser already has available from the network?
>
> I don't know about "result".  When the algorithm runs, exactly, will  
> depend on the amount of data the parser parses before returning to the  
> event loop.  Does that affect "result"?

Yes, it sounds like it very much does, and would result in disasters like  
this:

<!doctype html>
<video src="video.webm"></video>

<script>alert(document.querySelector('video').networkState)</script>

The result will be 0 (NETWORK_EMPTY) or 2 (NETWORK_LOADING) depending on  
whether or not the parser happened to return to the event loop before the  
script. The only way this would not be the case is if the event loop is  
spun before executing scripts, but I haven't found anything to that effect  
in the spec. I hope I'm wrong, of course.

-- 
Philip Jägenstedt
Core Developer
Opera Software