[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[Libevent-users] Introducing RProxy (and a re-introduction to libevhtp)



Today, my employer is making available a highly efficient reverse HTTP(S) proxy
called simply 'RProxy'. This project is being released open-source to encourage
the general community to participate in its evolution.

My employer always avoids trying to re-invent the wheel when it comes to software,
so why create another reverse-proxy?

Many of the wonderful open-source proxies that exist today are tailored to the
average GET <-> RESPONSE traffic types. For each request, they may spawn a new
thread, create a new connection to the back-end, or both. Many of the projects
we analyzed could not handle large streams of data efficiently since they would
block until the full client request has been received (hey, where did my memory
go?). Resource exhaustion was a common element under high load: memory, file
descriptors, CPU, etc. These existing projects are designed perfectly for common
traffic flows, but can quickly capsize under pressure.

My employer had a requirement for a proxy that could scale to thousands of
simultaneous SSL connections, with certificate verification, and various caching
methods, all while maintaining a low system resource footprint.

After testing all the popular and well maintained open-source proxy projects, we
could not find one that met our specific needs. It was under this pretext we
decided to roll our own.

Architecture
-------------------------------------------------------------------------------

The RProxy architecture uses a mix of threading and event-driven methods of
handling requests. At startup, a configured number of threads are spawned, each
with their own event loop. Each of these threads will make a configured number
of persistent connections to the configured back-end servers.

We leverage HTTP 1.1 to keep these connections open so that each incoming
request from a client does not force RProxy to establish a new connection to the
back-end. This results in each request being assigned a pre-existing connection
to a back-end (even if the client is using HTTP 1.0, or HTTP 1.1 with keep-alive
disabled). This technique is known as pipelining, a feature which most proxies
avoid due to the complexity of maintaining request states.

We solved this by creating three states a back-end connection can have:

- IDLE:    The connection is up and is able to be used to service a new request.
- ACTIVE:  The connection is being used to service another request.
- DOWN:    The connection is down, pending a reconnect.

When a new request is made, it is placed into a pending queue. This pending
queue is processed whenever a back-end's state transitions to IDLE. The request
is then associated with that IDLE connection and its state is changed to ACTIVE.

There are many configuration options that affect how requests in a pending state
are handled so that resource consumption does not become an issue under high
load.

Features
-------------------------------------------------------------------------------

The RProxy source code has a detailed and up-to-date configuration guide, but
some of the main features that stand out are:

- Various methods of load-balancing requests to a back-end.
- Transparent URI rewriting.
- The ability to append X-Header fields to the request being made to the
  back-end, including dynamic additions of extended TLS fields.
- Configurable thresholding and backlogging for both front-end and back-end IO.
- A flexible logging system.
- Full SSL support (via OpenSSL)
  * TLS False start
  * x509 verification
  * Certificate caching
  * Session caching
  * All other commonly used SSL options.

As mentioned prior, it is best to read the documentation to get a detailed
understanding of the many aspects of the system.

Components
-------------------------------------------------------------------------------

RProxy was built on top of several well maintained open-source libraries such as
Libevent, Libconfuse, Libevhtp, and OpenSSL. While in the process of writing
RProxy, many of the above libraries needed fixes and patches. We would like to
thank the maintainers of these projects for their willingness to help and accept
our changes (A special thanks to Nick Mathewson, maintainer of Libevent, whom we
harassed the most).

It is suggested that the most recent versions of the above libraries are used
for optimal performance.

Performance
-------------------------------------------------------------------------------

RProxy was tested primarily on various *NIX platforms, however most of the
performance tweaks were targeting Linux.

We used an Intel i7 quad-core processor, with a generic 1Gb ethernet adapter
running the latest version of CentOS for testing. Our SSL keys were 2048 bits,
with client certificate validation enabled.

With neither host or client based (RFC5077) caching, RProxy was able to
handle on the order of 2000 full SSL transactions per-second.

If one of the above cache methods were enabled, our testing demonstrated RProxy
was able to handle over 6600 SSL transactions per-second.

Large data flow tests showed that RProxy was able to run at 1 gigabit
line-rate (or as close as you can expect once the data has reached user-land).

Future
-------------------------------------------------------------------------------
We continue to add functionality to the software; virtual server support is currently in
development, as well as support for internal redirection. (See the develop branch to 
see where we're going).

I can haz source?
-------------------------------------------------------------------------------
The source can be found on github: https://github.com/mandiant/RProxy
The current stable release is v1.0.25.

It is suggested that RProxy be built with all external dependencies
downloaded and installed for you, creating a nice static binary with all
of the latest stable releases. This can be done with an optional cmake
flag:

(cd build; cmake -DRPROXY_BUILD_DEPS:STRING=ON ..; make)

Otherwise the following dependencies are as follows for optimal
performance:

libconfuse: http://savannah.nongnu.org/download/confuse/confuse-2.7.tar.gz
openssl:    http://openssl.org/source/openssl-1.0.0i.tar.gz (we've had issues with newer versions)
libevent:   https://github.com/downloads/libevent/libevent/libevent-2.0.19-stable.tar.gz
libevhtp:   https://github.com/ellzey/libevhtp/tree/0.4.14


Libevhtp
-------------------------------------------------------------------------------
I noticed an announcement on this list for a new project which attempts
to create a new evhttp type API. It was then I realized that I had never
really announced libevhtp on this mailing list, and we seem to be
duplicating efforts.

Libevhtp has been in development for over a year and is being used in many 
projects. It attempts to create a very flexible API for HTTP processing. 

Though it lack *real* documentation (relying on example code to show off
all the features), I am working on that side of things.

Some of the primary features I can rattle off the top of my head:

- per-connection or per-request hooks for all stages of request
  processing
   * pre_accept         [called before a connection is accepted]
   * post_accept        [called after a connection is accepted]
   * on_path            [called when the request path has been parsed]
   * on_headers_start   [called after request line, prior to header parsing]
   * on_header          [called after one key/val header has been parsed] 
   * on_headers         [called after all headers have been parsed]
   * on_new_chunk       [called when a single chunk octet is parsed] 
   * on_chunk_complete  [called when a single chunk is finished]
   * on_chunks_complete [called when all chunks have been processed] 
   * on_read            [called whenever body data has been read (this includes a body of a chunk)]
   * on_error           [called when an error occurs]
   * on_request_fini    [called when a request has been fully processed]
   * on_connection_fini [called when a connection has been terminated]

- Different methods of setting callbacks
   * evhtp_set_cb()     [set a callback for a specific uri]
   * evhtp_set_regex_cb [set a callback for a uri with a regex]
   * evhtp_set_glob_cb  [set a callback for a uri using a simple wildcard]

- Built-in threadpool support (you don't have enable libevent locking
  support!)

- Built-in SSL support.

A simple example: https://github.com/ellzey/libevhtp/blob/master/test_basic.c
A more complex example: https://github.com/ellzey/libevhtp/blob/master/test.c
A real-life application: Well... See above (RProxy)

Application design on using libevhtp's threading feature to do cool
things in parallel without locking: https://gist.github.com/2579114 (I
figured a redis example would work out well).

Libevhtp can be found here: https://github.com/ellzey/libevhtp
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.