[whatwg] Web-sockets + Web-workers to produce a P2P website or application

Thu Jan 21 18:33:10 PST 2010

comments inline

On Thu, Jan 21, 2010 at 8:24 PM, Mike Hearn <mike at plan99.net> wrote:
> WebSockets doesn't let you open arbitrary ports and listen on them,
> so, I don't think it can be used for what you want.

that's my understanding. My question to this is if it is possible to
open arbitrary ports and listen in on them if the spec was changed. If
so, what would be the chance of making a change like this this late in
the game. What would be the advantages and disadvantages of including
this at the HTML5 spec level instead of the browser level.

At some level I feel like this should have some sort of implementation
at the spec level for security reasons. You wouldn't want the browser
to simple share all the static content from anywhere it's visited.
Instead you'd want only content that is static to be "on the DMZ" so
to speak.

Also, how would you go about making this work since ultimately you
still have some sort of implementation at the originating web server,
web application level. It appears to me that simply implementing this
at the browser level would not be enough. You'd need appropriate
functionality on the originating web server side simply to provide
instructions to the browsers.

i.e. "here's the torrent file for this web application, in it you (the
browser agent) will find a list of files available in distributed form
from your peers and the hash table to verify authenticity."

It seems to me that this idea would also complement the idea of the
manifest for offline browsing in HTML5. The torrent file would simply
be defining a manifest of files that are to be fetched from peers
instead of the originating server. The browser would also need a
central location to register itself, so that peers can discover other
peers.

>
> P2P in general is a lot more complicated than it sounds. It sort of
> works for things like large movies and programs because they aren't
> latency sensitive and chunk ordering doesn't matter (you can wait till
> the end then reassemble).

On the other hand, when the swarm is large enough this isn't that big
of a problem and this is an application where everyone is a seeder
while they are using the application. It would be idea if you wouldn't
be able to not seed and there was no such thing as a ratio and the
browser would throttle throughput automatically.

In addition to this, the application can also specify a static
location for the static content in case the client times out trying to
get the file from a peer. i.e. "Hey browser, download these files from
one of these peers. However if you don't succeed within 1500 ms, feel
free to contact me again and I'll send it to you"

>
> But it has problems:
>
> - A naive P2P implementation won't provide good throughput or latency
> because you might end up downloading files from a mobile phone on the
> other side of the world rather than a high performance CDN node inside
> your local ISP. That sucks for users and also sucks for your ISP who
> will probably find their transit links suddenly saturated and their
> nice cheap peering links with content providers sitting idle.

Any ideas on how this could be resolved?

I figure if the application is popular enough, the peers could be
geographically tagged using the "GPS" functionality of HTML5. Clients
would automatically get better connections and throughput and
preference if they choose to make their location available to the
"torrent" server so that peers can look up peers nearby.

>
> - That means unless you want to have your system throttled (or in
> companies/universities, possibly banned) you need to respect network
> topology and have an intimate understanding of how it works. For
> example the YouTube/Akamai serving systems have this intelligence but
> whatever implementation you come up with won't.

Truth is that network connectivity is getting better and better. As
aggregate and average network speeds increase around the world, this
problem will be overcome. As Eric Schmidt said back in 93 when he was
at Sun Microsystems:

"When the network becomes as fast as the processor, the computer
hollows out and spreads across the network."

I wouldn't be surprised if principles from Erlang could be borrowed
and applied to this idea, because from what I understand, Erlang has
an extremely robust agent-based message passing model with lots of
error checking and recovery features.

>
> - P2P is far more complicated than an HTTP download. I never use
> BitTorrent because it basically never worked well for me, compared to
> a regular file download. You don't see it used much outside the pirate
> scene and distributing linux ISOs these days for that reason, I think.

As a product manager myself, I don't think you see it outside those
two scenes because it doesn't hide enough of the complexity for
mainstream use. Compare the following two:

A)
Step 1: Point your browser at www.myapplication.com

B)
Step 1: Point your browser at www.torrentclient.com
Step 2: Download and install torrent client
Step 3: Configure torrent client to work well with your network if you
are behind a NAT
Step 4: Find a torrent file repository
Step 5: Find and download the torrent file you want and add it to your
torrent client if it does not already do so automatically,

Other than step 3, all the rest of those steps can be hidden from the
user with the idea I suggested. Using a P2P distritbuted application
suddenly becomes as easy as typing in a URL and clicking enter.

An as far as I can tell Step 3 can be resolved by IPv6.

>
> Your friends problem has other possible solutions:
>
> 1) Harvesting low hanging fruit:
> 1a) Making sure every static asset is indefinitely cacheable (use
> those ISP proxy caches!)
> 1b) Ensuring content is being compressed as effectively as possible
> 1c) Consider serving off a CDN like Akamai or Limelight. There is
> apparently a price war going on right now.

I'm going to check if we've done 1a.

AFAIK, we've performed 1b and we already use a CDN to distribute the
content (1c).

We've also made sure to set the etags and the last modified HTTP headers.

>
> and of course the ultimate long term solution
>
> 2) Scaling revenues in line with traffic
>

The reason I suggested the idea was to get away from cost entirely.

The more popular something is the cheaper it is to distribute it.
Bandwidth costs effectively drop with popularity. An inverse price
relationship and negative marginal cost. Imagine the impact on the
economics of information dissemination. This would turn things on its
head.

One dangerous side to this is that you would promote natural
monopolies (i.e. a stock market) in which liquidity attracts more
liquidity. It would all of a sudden be costly (relatively speaking to
get an aggregation site off the ground, because the massive players
would have hyper-economies of scale.

i.e. if their videosharing site is 25x as popular, their cost would
not be 25x the cost, but instead may be only 4x the bandwidth cost
because all the peers on their network would be providing the
necessary bandwidth in swarm form.

on the other hand. there all of a sudden is an incredible amount of
value in trying to produce quality content (you'd have a reason for
the cult of the amateur to become the cult of the individual pro).

okay, enough rambling.