Hi, all. Roger asked me to brain-dump here to explain the refactoring I just did on how Tor handles buffering on OR connections. (This went in as svn revisions 9904 through 9907.) Previously, we did all of our buffering on the OR connection structures: We'd pull a cell from on or_connection_t (or package one from the edge), {en|de}crypt it, use the relevant circuit to figure out what or_connection_t it should go onto, and append it to that or_connection_t's outbuf. The patches I just checked in adds another step: after cells are encrypted and the relevant circuit are found, the cells are queued in a linked list at the circuit_t. When the or_connection_t's outbuf falls below a certain size, we pull cells from the appropriate cell queues until the outbuf is more full again. There are a few more details here that I'll explain below. First, I'll talk about why it's a good idea: - The ring buffer implementation in buffer.c amortizes resizing costs for buffers when doubling them when they're full and halving them when they're 3/4-empty. Thus, many buffers spend most of their time largely empty: this means that we wind up malloc'ing a lot more memory than we need. - There are a lot of features we want to add that require us to assign different priority to different kinds of traffic. (For example, some relay-incentive schemes need us to give priority to traffic from other ORs.) - We need to kill connections when their buffers use up too much RAM. But when that happens, we kill every circuit on the connection. It might be better to kill only the circuits that have a lot of data queued. - Tracking circuit queues separately from one another lets us fill circuits in a fair fashion: We can start reading from exit connections only at a rate that the circuits are able to flush themselves, rather than at the maximum rate possible. Implementation issues: 1) I've also done a patch to stop reading on edge connections when the circuit gets too full, and start reading on the edge connections again when the circuit gets emptier. [Maybe we should aim to read from edge connections at constant rates rather than stopping and starting like this. That will be harder, but will probably give better TCP performance.] 2) The way the code is currently implemented, we check whether to add more cells to an outbuf whenever we flush from that outbuf. This doesn't work when adding the first few cells to circuit queues, since there's nothing to flush. Therefore, whenever we add a cell to a cell queue and the or_conn's outbuf is empty, we write the cell straight to the outbuf. [I think this rule should maybe become "add cells straight to the outbuf whenever it is small", but I don't know whether that's really an improvement.] 3) To avoid having to walk all the circuit on an or_connection (there can be quite a lot of them), we keep the _active_ circuits (those with pending cells) in a doubly-linked-ring on each or_connection. An or_circuit is potentially connected to two or_connections, so each or_circuit is up to two of these rings, and so has two next and prev pointers. This makes the link and unlink logic a little trickier than it needs to be, but I can't immediately see how to simplify it. Have a look and drop me a line if you have a good idea. 4) The stop-adding-cells and start-adding-cells thresholds for moving cells from cell queues to or_connections are currently 32K and 16K. The stop-adding-cells and start-adding-cells thresholds for packaging cells from the edge are currently 256 cells and 64 cells. [I have no good justification for any of these numbers besides the 16K one; for efficient bandwidth usage, we want as many full TLS frames going out as possible.] 5) The code currently does all the crypto before adding cells to queues, but does the cell-to-bytes encoding when adding cells to outbufs. We could maybe get a little more ram-efficient by packaging in advance too. 6) Cells are about to become the most heavily allocated and freed structures in Tor. This is probably going to incur some malloc overhead, especially on platforms where malloc() sucks. Since cells are fixed-size, this arguably calls for arena allocation. 7) I stuck most of the cell queue logic in relay.c. Roger: let me know if it belongs somewhere else instead. 8) There is no number eight. peace, -- Nick Mathewson
Attachment:
pgpCNjk98O7CQ.pgp
Description: PGP signature