[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #6808 [Tor Relay]: We shouldn't have to rescale the EWMA for circuit selection
#6808: We shouldn't have to rescale the EWMA for circuit selection
-------------------------+--------------------------------------------------
Reporter: andrea | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Tor: unspecified
Component: Tor Relay | Version: Tor: 0.2.4.2-alpha
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
From relay.c comments on the EWMA algorithm:
If 'double' had infinite precision, we could do this simply by counting a
cell sent at startup as having weight 1.0, and a cell sent N seconds later
as having weight F^-N. This way, we would never need to re-scale any
already-sent cells.
To prevent double from overflowing, we could count a cell sent now as
having weight 1.0 and a cell sent N seconds ago as having weight F^N.
This, however, would mean we'd need to re-scale *ALL* old circuits every
time we wanted to send a cell.
So as a compromise, we divide time into 'ticks' (currently, 10-second
increments) and say that a cell sent at the start of a current tick is
worth 1.0, a cell sent N seconds before the start of the current tick is
worth F^N, and a cell sent N seconds after the start of the current tick
is worth F^-N. This way we don't overflow, and we don't need to
constantly rescale.
(end quote from relay.c comments)
I don't think it's true that we would need to re-scale on every send to do
this continuously. Conceptually, every circuit gets a weight which is the
sum over all cells C_i they've transmitted at time t_i in the past of
exp(-(T-t_i)*lambda), where lambda is the decay constant and T is the
present time. Equivalently, we can factor out an exp(-T*lambda)
everywhere and get time-independent weights, which would be the double-
overflowing representation those comments mentioned.
However, if we have a priority queue sorted on the summed weight of each
circuit, the decay over time can't change the sort order, as long as the
time constants are the same for every circuit. The selection of which
cell to transmit happens in channel_flush_from_first_active_circuit() (or
connection_or_flush_from_first_active_circuit() in the pre-channels
world); the algorithm to find a cell to send should go like:
1.) Pick the lowest-weighted active circuit (this is O(1) if we have the
sorted circuit list ready, as with the current priority queue
implementation), and get the first cell from its queue.
2.) Send the cell and update the EWMA for that circuit (this calculation
is O(1), again)
3.) If the circuit has no more cells to send, remove it from the active
circuit list (for a reasonable sorted circuit list implementation, this is
no worse than O(log(n))).
4.) Otherwise, if it still has cells, move it to its new position in the
active circuit list (again, this shouldn't be worse than O(log(n)).
5.) If we want more cells, go back to step 1, possibly picking a new
circuit. Otherwise, we're done.
Now, there's a slight hitch there: the comparison function for the weights
depends the current time, T - except, as we saw, it can be factored out,
and the decay never changes the order. Thus, instead of rescaling stored
weights, we should store the time each circuit EWMA was last updated
(i.e,, the last time we sent a cell on behalf of that circuit) and the
value of sum(exp(-(T-t_i)*lambda) *at that time T*, and then redefine the
comparison function to compensate.
Specifically, given two circuits C_a and C_b, with last updated times T_a
and T_b and stored weights W_a = sum(exp(-(T_a-t_i)*lambda),{i for all
cells sent on C_a}) and W_b = sum(exp(-(T_b-t_i)*lambda),{i for all cells
sent on C_b}), then we can just rescale W_b to T_a by multiplying by exp
(T_b-T_a), so we cancel out the exp(-T_b) in W_b. That is, circuit C_a
should take precedence over C_b iff W_a < exp(T_b-T_a)*W_b.
Notice that W_a, W_b, T_a, and T_b are all static values which only ever
need to be updated when a circuit actually sends a cell; we can define a
comparison function on them rather than actually storing up-to-date
weights and needing to periodically rescale, and thus obtain the precision
of a continuous calculation without the compromise of rescaling only on
ticks, without the computational cost of having to rescale at all.
(I believe the current implementation in relay.c is actually incorrect
anyway in the case that it wants to send more than one cell and the
lowest-weighted circuit changes from one cell to the next in the loop; see
the assert that came up in testing for channel_t mentioned in ticket 6465:
https://trac.torproject.org/projects/tor/ticket/6465#comment:6 )
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6808>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs