[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Low-Cost Traffic Analysis of Tor [QoS Solution?]

Thus spake Mike Perry (mikepery@xxxxxxxxxx):

> Basically, the idea was to seperate traffic into 4 classes based upon
> exit port, and then use some form of Weighted Fair Queuing algorithm
> to balance between them
> (http://archives.seul.org/or/dev/Feb-2005/msg00008.html).
> The classes were:
>   0 -- Unbuffered interactive traffic (ssh, telnet, rdesktop, tcp dns)
>   1 -- Application buffered interactive traffic (web, irc, aim)
>   2 -- Data transfer (ftp)
>   3 -- Bulk Traffic (bit torrent and the rest)
> My current idea is to do the WFQ with something like serving class 0
> four times, class 1 three times, class 2 twice, and class 3 once
> (staggered in a round-robin fashion). If a class queue is empty, it
> loses its turn.  I also intend to bump class 0 traffic with full 500
> byte payloads up to class 1 or 2, to limit abuse.

Ok, so in an attempt to solidify my point, I decided to think about it
more specifically in terms of the interaction with the queuing itself
instead of the relative priorities. Upon doing this, I realized that
queuing by itself is not sufficient, but that it is still possible to
defeat the attack. I basically broke down the scenarios into the
following outline, which I believe covers all cases:

A). Probe and Server in the same queue: ATTACK SUCCEEDS
B). Probe and Server in different queues
   1). Server traffic fills an empty or emptying queue: ATTACK SUCCEEDS
       a). Server traffic is in a previously empty queue
       b). Server traffic prevents a queue from draining
   2). Server traffic is in a queue that has longer length* than
       probe queue: ATTACK FAILS


A. If the probe and the server are in the same queue, and queuing
conditions are otherwise the same for both baseline and the attack,
then this is essentially the same queuing condition as in the original
attack but with added noise. Theoretically, the attack should continue
as normal.

The success of the attack may be more sensitive to traffic arriving
from higher priority classes, but at the end of the day, this is just
noise and can be averaged out through many successive trials.  It may
be possible that this averaging takes too many trials to be practical,
however, but this probably should not be relied upon.

B. If the probe and the server are in different queues: 

1. The burst traffic must use a previously empty queue class (point a)
or prevent an existing queue class with weighted service length*
shorter than the probe class from draining before the probes are
serviced (point b). If either of these conditions hold (a or b), the
attack succeeds.

Point a:

If the burst traffic uses a previously empty queue class, then this
means that there is increased delay felt by the probe traffic, since
at least SOME time will be spent servicing this new queue that wasn't
spent before. If the difference between the priorities of probe and
queue is great enough, this time difference may be negligible and
impossible to detect through traffic noise. But theoretically, the
attack is still possible.

Point b:

If the burst traffic prevents a queue from draining that would
normally have drained, then once again there is delay that has been
introduced in the probe traffic that normally would not have been

2. If the burst traffic falls into a queue class that already had
equal or greater length* than the probe traffic, then the attack fails. 

In this case, the probe traffic would experience the same delay as it
would have if there was no burst from the server, since it was already
in full competion with that queue. It should not be delayed any more
than usual. ATTACK FAILS.


So, it would appear that my original QoS solution isn't as good as I
thought it was. However, I'm not quite ready to give up yet. There are
3 potential solutions that come to mind which essentially aim to drive
the queuing system into the safe Scenario B2. These solutions can be
combined as desired. Note that none of the solutions depend on WFQ. FQ
can be substituted if desired (and in fact FQ may actually be more
robust against variant attacks, as discussed below).

I. Change the WFQ algorithm to send cover traffic (Fix Scenario B1) 

Cover traffic by itself does not defend against the attack, as is
pointed out in the paper. However, in the context of queuing, it would
at least prevent Scenario B1 from developing, since there would be
no such thing as adding delay if all queues were always busy. 

Basically this solution removes the "empty queues lose their turn"

Unfortunately, this does add some unnecessary delay, since queues with
real traffic in them will have to wait on the cover traffic queues. It
also limits each class to only achieving a certain percentage of
network traffic (ie web+irc+aim could never exceed 30% of total
network load).  The weights can be altered based on statistical info
about the tor network exit node traffic, gathered over the coming
months via a patch that I can submit shortly, if desired.

II. Attempt to detect attack and queue appropriately (Fix Scenario A+B1)

I implement my WFQ as I described, but if I see numerous successive
packets of 500 length from a given circuit (like 2-3 seconds worth), I 
alter its class.

Note that simply lowering the priority will not work, since this may
still induce probe delay as proven in Scenario B1.

What will work is moving circuits with numerous successive 500byte
payloads to the longest** queue. This can also be done with regular
non-weighted Fair Queuing, if QoS is still opposed. This should
effectively create Scenario B2 from either Scenario A or B1.

However, in the event that the burst traffic IS the longest** queue,
this approach will fail. In this case, cover traffic should be
temporarily inserted as per Solution I until the burst circuit slows
down or closes. Or, if Solution I is deemed acceptable on its own,
cover traffic can always be inserted.

If it is possible to produce a variant of the attack where the probe
server sends high bandwidth traffic, then the queuing algorithm needs
to attempt to balance all high bandwidth circuits evenly among all
queues. This probably will prevent any attempts at WFQ, or at least
make WFQ considerably more difficult. Is such an attack possible?

III. Prevent tor nodes from operating near capacity (General Solution)

It seems to me that the attack hinges on the ability to drive tor
nodes to their full capacity. It is also suspected that the attack
operates optimally when tor nodes are also already loaded.

If tor was modified to automatically close its highest traffic circuit
as soon as the bandwidth threshold was reached (or as soon as the
window for transmitting data to the next hop was empty), this may
prevent the optimal attack scenario from developing. Moreover, it may
help limit the appeal of P2P/bulk data transfer.

Also, perhaps a "high watermark" bit can be added to the directory
entry for each node, and clients can try to avoid picking nodes with
this bit set, since they will be operating close to capacity and may
drop circuits. 

We could combine solutions II and III and instead of dropping these
circuits, move them to the longest queue and also set the "high
watermark" directory flag so clients can avoid that server. 

Or, if everyone hates fair queuing as a general principle, this
solution may be suitable by itself.

But note that if very few tor nodes are operating near capacity, the
high watermark bit can be used to reveal paths without a probe.
However, the time frequency of update for the directory server (on the
order of 20 minutes) is much greater than your typical connection, so
this is unlikely to be practical.


I have attempted to make the case for these modifications as
rigorously as I could. I would really like to see this attack fixed,
and I don't mind doing the all of the work to do so (in fact, I would
enjoy it. I find this problem very interesting). But someone needs to
make sure I'm not crazy ;)

* The weighted service length is defined as the number of full
dequeuing cycles through all 4 queues. It is measured in multiples of
10 packet service times (4+3+2+1=10). If WFQ is abandoned for plain
FQ, the length is then measured in round-robin cycles through all four

** When defending against the attacks, the circuit that is being
moved from queue to queue should NOT be counted in the length

Mike Perry
Mad Computer Scientist
fscked.org evil labs