[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The gritty details of sendmes and topics



Right now there are two layers w/in our OR-specific communication
stack.

1. Thick Pipes
2. Anonymous Connections

Plus the underlying network. Also, there are application connections,
currently set at one per anonymous connection and thus irrelevant
between the proxies. Roger is proposing a third layer so that we have

1. Thick Pipes
2. Anonymous Connections
3. Topics

But it's really four since application connections are no longer
necessarily one per anything, but for this message I'm going to assume
a single application connection per topic.  (Aside: I'm going to
switch from `anonymous connection' to `OR circuit', since I think it's
clearer and more accurate. When clear from context, I'll just call
these circuits.)


We should have some kind of flow-control/rate-limiting at every layer
below application or an explicit decision to punt on a layer as long
as it won't mess up the layers below it. A desired feature of these
controls should be that they work independently but compatibly.  That
is, I think we should respect the layering much like ordinary network
protocols, so thick pipes know about OR circuits, and OR circuits know
about topics(, and topics know about applications).

How could this work vis a vis the problems Roger raised.

Thick pipes I believe are already covered---much as Marc talked about
per tunnel in AN I think. (But, if I'm wrong let's not get
distracted.) There is a rate limit on the thick pipe that is indeed
hop-to-hop. 

The current design for circuits has control messages (sendmes) for a
given circuit that propogates at the thick pipe level.  I think that
should stay as it is.

Topic control should be at the circuit level. All of the problems
Roger described come from mixing the layers in some way.  Just as OR
circuit control is at the thick pipe layer, topic control should be at
the circuit layer. It should not pass out to the thick pipe.

Let's consider this idea wrt the problems Roger raised.

> 
> Problem #1: Since begin, end, and connected topic commands are sent
> inside data cells, then they use up a bit of the window. Imagine the
> circuit has a receive window of 0 at the exit side, and it's stopped
> reading from any of the webservers. Then one of the webservers hangs
> up. The exit node sends an 'end' data cell. One of the nodes in the
> middle of the circuit sees a data cell when the receive window is 0,
> freaks out, and kills the whole circuit.
> 
> Solution #1: No problem, I'll queue data cells right before they enter
> the circuit. If the receive window is >0, it sends the immediately and
> decrements the window. If the window hits 0, it tells all topics to quit
> reading from the webserver until further notice. When a sendme arrives
> I'll dump the queue of pending cells onto the circuit, and if there's
> any window left I'll notify all the topics to start reading again.
> 

I agree with Marc. So far so good and consistent with what I proposed.

> Problem #2: But we're still doing flow-control on a per-circuit level,
> and maybe we need a per-topic level. Imagine the user has two topics
> open: an ssh connection and a wget of a kernel tarball. Let's further
> imagine wget is broken, such that it reads half of the file and then for
> some reason forgets to keep reading. So the wget proceeds as normal,
> and sendmes work great, until the wget wedges. Then data continues to
> stream from the webserver. If the only topic were the wget, then the
> windows would run out and the exit node would stop reading from the
> webserver. But whenever a data cell arrives for the ssh topic, it finds
> the outbuf empty, sends back a sendme, and immediately the wget topic
> gets another 100 cells dumped on it. This repeats and the wget outbuf
> at the AP grows larger and larger. Or perhaps worse, the wget topic eats
> the whole window, so that when the ssh server wants to send a cell five
> minutes later, it finds the window at the exit to be 0, and there's no
> hope of ever getting a sendme.
> 

So, what if the topics are controlled within the circuit? I am not yet
saying exactly how that should happen, and I bet it can get subtle
real fast.  But, as a first naive suggestion to show the idea, each
topic can reserve a percentage of the circuit's available
capacity. Reservation might be determined by guesses from topic
application or other factors. Bandwidth might be preallocated or
perhaps measured on the outbound side by buffers and on the inbound
side by receive window. Unused portions can be given to other topics
for efficiency as long as they can be returned to the designated topic
with minimal delay. There is still a question of making sure that
the reservations at each end of a circuit are consistent, but this
should be done by communication between inbound and outbound proxies,
i.e., at the circuit level.

> Solution #2: No problem, I'll take the separate sendme cell type out,
> and I'll make a topic command called sendme. Then the flow control is
> done on a per-topic level, and we're all set. Indeed, now I don't have
> to do any inefficient "tell them all to stop reading, tell them all to
> start reading" things. (Problem #2a: what if each side empties its window
> all at once, and the cells work their way down the circuit, cross, and
> end up on the other side. Then neither side has any window left to send
> a sendme! Solution #2b: No problem, you always leave yourself a window
> of 1, not 0, in case you need to send a sendme later.)
> 

No, bad. Topics shouldn't talk to thick pipes.

> Problem #3: But wait, now the nodes in the middle of the circuit can't
> tell that they're supposed to increment their window. This is no good
> at all.
> 

In my proposal, this problem doesn't arise because the same circuit
level flow controls are taking place at each node as before.

> Solution #3: No problem, I'll go back to the
> sendme-is-a-separate-cell-type idea, but this time I'll stick the
> topic_id in the payload. The ACI's at each hop of the circuit will make
> sure it gets to the other side of the circuit. (For added complication,
> I could crypt the payload just like with data cells, and then peel
> off a layer from the topic_id at each step. Or I could accept that
> correlating topic_id along the circuit is no easier than simply counting
> packets/timing, and leave topic_id uncrypted.)
> 

No, bad. Topics shouldn't talk to thick pipes.

> Problem #4: But now we're relying on each topic to refill a communal
> circuit-wide window. Imagine you have 50 topics, and each of them
> delivers 20 cells. None of the individual topics has gotten enough cells
> to trigger a sendme, yet the window of 1000 is all gone. Deadlock.
> 

Again, no problem. The sendme cells should be sent by the proxy
(circuit control proxy?) which should have negotiated with the topics
so that they buffer and wait their turn, aren't allocated in the first
place or whatever.

Please throw darts.

aloha,
Paul