[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Importance of HTTP connection keep-alive

[CC-ing polipo-users again]

>> this is an HTTP/1.0 site.  There are fortunately very few of these
>> left nowadays.

> What exactly is the problem with the site?  Watching the circuits in
> Vidalia I had the impression that Polipo used keep-alive.

HTTP/1.0 keepalives and HTTP/1.1 persistent connections are not quite
the same thing.  From memory, the limitations of HTTP/1.0 are

 - HTTP/1.0 kept-alive connections must be broken after every dynamic
 - pipelining is not allowed in HTTP/1.0;
 - HTTP/1.0 keepalives are not allowed when speaking to a proxy.

Polipo respects the first two limitations.  It doesn't respect the
third limitation, but instead plays a number of tricks that ensure
that it works with common HTTP/1.0 proxies (Squid, WWWOFFLE, Privoxy).

In order to be nice to the network, Polipo limits itself to
2 connections when speaking to a server that can do persistent
connections or keepalives.  This works fine when there are
opportunities for pipelining, but results in poor performance

You can customise the magic value 2 with the variable serverSlots.
I'd actually be very curious to see the results for your previous test
with serverSlots set to 5.  (I guess I should be more aggressive with
HTTP/1.0 servers by default; ideally, I'd like to work-out a scheme
to tune serverSlots automatically depending on our traffic patters.)

There's a paper about the tradeoffs involved on


> Can you name some other sites that you consider valid targets then?

There's no good answer to that, unfortunately, as there are so many
variables involved; I don't think there's a typical web site, there
are a few classes of web sites that I believe are typical, and that
Polipo should deal with pretty well.

The easiest case is an HTTP/1.1 web server with purely static content,
or dynamic content generated by people who knew what they were doing.
Unfortunately, such servers have been becoming rare as most sites have
moved to dynamic content generation.

The KDE site is what I believe is quite typical of a modern web site:
on the one hand the content is dynamically generated by crufty PHP
scripts (no useful validators are provided), but the HTTP is generated
by a fully HTTP/1.1 web server (Apache 2).  Polipo is slightly
suboptimal in such a case, but it should be reasonably good.

Another fairly common case is that of a mis-configured server that
doesn't do persistent connections at all -- for example
http://www.gnome.org/.  Polipo will notice that after a few requests,
and switch to using up to 8 connections to that server.  Unless
there's something really wrong in either Polipo or Privoxy,
performance should be roughly identical in the two implementations
(except for the effects of caching and range requests, of course).

The Spiegel.de web site that you tested against is actually an
interesting case.  It appears to be a bunch of typical PHP scripts (no
ETags) running on an HTTP/1.0 web server hidden behind no less than
two HTTP/1.0 front-end proxies (somebody is probably trying to do
load-balancing with a total budget of 12 pf. and an old button).
While such interesting configurations are uncommon, single HTTP/1.0
front-end proxies do happen sometimes, so I'll increase serverSlots
when speaking to such a site in the next version of Polipo.