Great, lots of good comments. I'll respond in depth once the fever has left my brain! =D On Thu, Mar 03, 2016 at 03:15:26PM +0100, George Kadianakis wrote: > Ola Bini <obini@xxxxxxxxxxxxxxxx> writes: > > > Hi, > > Here are some more comments to the latest version of the proposal, as seen here: > https://github.com/twstrike/torspec > > > Filename: 259-guard-selection.txt > > Title: New Guard Selection Behaviour > > Author: Isis Lovecruft, George Kadianakis, [Ola Bini] > > Created: 2015-10-28 > > Status: Draft > > > > §1. Overview > > > > Tor uses entry guards to prevent an attacker who controls some > > fraction of the network from observing a fraction of every user's > > traffic. If users chose their entries and exits uniformly at > > random from the list of servers every time they build a circuit, > > then an adversary who had (k/N) of the network would deanonymize > > F=(k/N)^2 of all circuits... and after a given user had built C > > circuits, the attacker would see them at least once with > > probability 1-(1-F)^C. With large C, the attacker would get a > > sample of every user's traffic with probability 1. > > > > To prevent this from happening, Tor clients choose a small number of > > guard nodes (currently 3). These guard nodes are the only nodes > > that the client will connect to directly. If they are not > > compromised, the user's paths are not compromised. > > > > But attacks remain. Consider an attacker who can run a firewall > > between a target user and the Tor network, and make > > many of the guards they don't control appear to be unreachable. > > Or consider an attacker who can identify a user's guards, and mount > > denial-of-service attacks on them until the user picks a guard > > that the attacker controls. > > > > In the presence of these attacks, we can't continue to connect to > > the Tor network unconditionally. Doing so would eventually result > > in the user choosing a hostile node as their guard, and losing > > anonymity. > > > > This proposal outlines a new entry guard selection algorithm, which > > addresses the following concerns: > > > > - Heuristics and algorithms for determining how and which guard(s) > > is(/are) chosen should be kept as simple and easy to understand > > as possible. > > > > - Clients in censored regions or who are behind a fascist firewall > > who connect to the Tor network should not experience any > > significant disadvantage in terms of reachability or usability. > > > > - Tor should make a best attempt at discovering the most > > appropriate behaviour, with as little user input and > > configuration as possible. > > > > > > §2. Design > > > > Alice, an OP attempting to connect to the Tor network, should > > undertake the following steps to determine information about the > > local network and to select (some) appropriate entry guards. In the > > following scenario, it is assumed that Alice has already obtained a > > recent, valid, and verifiable consensus document. > > > > The algorithm is divided into four components such that the full > > algorithm is implemented by first invoking START, then repeatedly > > calling NEXT while adviced it SHOULD_CONTINUE and finally calling > > END. For an example usage see §A. Appendix. > > > > Several components of NEXT can be invoked asynchronously. SHOULD_CONTINUE > > is used for the algorithm to be able to tell the caller whether we > > consider the work done or not - this can be used to retry primary > > guards when we finally are able to connect to a guard after a long > > network outage, for example. > > > > This algorithm keeps track of the unreachability status for guards > > in state private to the algorithm - this is re-initialized every time > > START is called. > > > > Hmm, didn't we decide to persist the unreachability status over runs, right? > Or not? > > The argument for doing so is "What happens if I'm a hidden service that needs to > make 5 circuits per second, and my first 2 guards happen to be offline? Do I > have to spend time probing them everytime I make a new circuit?". > > > The algorithm expects several arguments to guide its behavior. These > > will be defined in §2.1. > > > > The goal of this algorithm is to strongly prefer connecting to the > > same guards we have connected to before, while also trying to detect > > conditions such as a network outage or a network environment that > > blocks most ports. The way it does this is by keeping track of how > > many guards we have exposed ourselves to, and if we have connected > > to too many we will fall back to only retrying the ones we have > > already tried. The algorithm also decides on sample set that should > > be persisted - in order to minimize the risk of an attacker forcing > > enumeration of the whole network by triggering rebuilding of > > circuits. > > > > > > §2.1. The START algorithm > > > > In order to start choosing an entry guard, use the START > > algorithm. This takes four arguments that can be used to fine tune > > the workings: > > > > USED_GUARDS > > This is a list that contains all the guards that have been used > > before by this client. We will prioritize using guards from this > > list in order to minimize our exposure. The list is expected to > > be sorted based on priority, where the first entry will have the > > highest priority. > > > > SAMPLED_UTOPIC_GUARDS > > This is a set that contains all guards that should be considered > > for connection under utopic conditions. This set should be > > persisted between runs. It will be filled in by the algorithm if > > it's empty, or if it contains less than SAMPLE_SET_THRESHOLD > > guards after winnowing out older guards. It should be filled by > > using NEXT_BY_BANDWIDTH with UTOPIC_GUARDS as an argument. > > > > Should we use UTOPIC_GUARDS or REMAINING_UTOPIC_GUARDS as the argument? > > > SAMPLED_DYSTOPIC_GUARDS > > This is a set that contains all guards that should be considered > > for connection under dystopic conditions. This set should be > > persisted between runs. It will be filled in by the algorithm if > > it's empty, or if it contains less than SAMPLE_SET_THRESHOLD > > guards after winnowing out older guards. It should be filled by > > using NEXT_BY_BANDWIDTH with DYSTOPIC_GUARDS as an argument. > > > > EXCLUDE_NODES > > A set of nodes that we should not consider using as a guard. > > > > N_PRIMARY_GUARDS > > The number of guards we should consider our primary > > guards. These guards will be retried more frequently and will > > take precedence in most situations. By default the primary > > guards will be the first N_PRIMARY_GUARDS guards from USED_GUARDS. > > > > DIR > > If this argument is set, we should only consider guards that can > > be directory guards. If not set, we will consider all guards. > > > > The primary work of START is to initialize the state machine depicted > > in §2.2 The NEXT algorithm. > > The initial state of the machine is defined by: > > > > GUARDS > > This is a set of all guards from the consensus, without > > EXCLUDE_NODES and potentially filtered if DIR is set. > > > > UTOPIC_GUARDS > > This is a set of all guards to use under utopic conditions. It > > will primarily be used to fill in SAMPLED_UTOPIC_GUARDS. This > > set will be initialized to be the same as GUARDS. > > > > DYSTOPIC_GUARDS > > This is a set of all guards to use under dystopic conditions > > (usually when we are subject to a firewall that restricts the > > ports we can connect to). It will primarily be used to fill in > > SAMPLED_UTOPIC_GUARDS. This set will be initialized to be the > > I guess you mean SAMPLED_DYSTOPIC_GUARDS. > > > subset of GUARDS that listen to ports that are allowed by > > dystopic conditions. > > > > REMAINING_UTOPIC_GUARDS > > This is a running set of the utopic guards we have not yet tried > > to connect to. It should be initialized to be SAMPLED_UTOPIC_GUARDS > > without USED_GUARDS. > > > > Maybe here we should also mention that we will reinsert guards that we have not > tried in a long time (GUARDS_RETRY_TIME) as specified by 2.2.2? > > > REMAINING_DYSTOPIC_GUARDS > > This is a running set of the dystopic guards we have not yet tried > > to connect to. It should be initialized to be SAMPLED_DYSTOPIC_GUARDS > > without USED_GUARDS. > > > > TRIED_GUARDS > > This set keeps track of all utopic guards we have tried connecting > > to. This should be initialized to the empty set. > > > > TRIED_DYSTOPIC_GUARDS > > This set keeps track of all dystopic guards we have tried connecting > > to. This should be initialized to the empty set. > > > > STATE > > A variable that keeps track of which state in the state > > machine we are currently in. It should be initialized to > > STATE_PRIMARY_GUARDS. > > > > PRIMARY_GUARDS > > This list keeps track of our primary guards. These are guards > > that we will prioritize when trying to connect, and will also > > retry more often in case of failure with other guards. > > It should be initialized by calling algorithm > > NEXT_PRIMARY_GUARD repeatedly until PRIMARY_GUARDS contains > > N_PRIMARY_GUARDS elements. > > > > > > §2.2. The NEXT algorithm > > > > The NEXT algorithm is composed of several different possibly flows. The > > first one is a simple state machine that can transfer between four > > different states. Every time NEXT is invoked, it will resume at the > > state where it left off previously. In the course of selecting an > > entry guard, a new consensus can arrive. When that happens we need > > to update the data structures used, but nothing else should change. > > > > Before jumping in to the state machine, we should first check if it > > was at least PRIMARY_GUARDS_RETRY_INTERVAL minutes since we tried > > any of the PRIMARY_GUARDS. If this is the case, and we are not in > > STATE_PRIMARY_GUARDS, we should save the previous state and set the > > state to STATE_PRIMARY_GUARDS. > > > > > > §2.2.1. The STATE_PRIMARY_GUARDS state > > > > Return each entry in PRIMARY_GUARDS in turn. For each entry, if it > > was not possible to connect to it, mark the entry as unreachable, > > and add it to TRIED_GUARDS. > > [XXX defining "was not possible to connect" as "entry is not live" according > > to current definition of "live entry guard" in tor source code, seems > > to improve success rate on the flaky network scenario. > > See: https://github.com/twstrike/tor_guardsim/issues/1#issuecomment-187374942] > > > > Hmm, I'm not sure what this XXX means exactly. I believe we should actually try > to _connect_ to those primary guards and not just check if we think they are live. > > > If all entries have been tried, restore the previous state and go > > there. If there is no previous state, transition to STATE_TRY_UTOPIC. > > > > > > §2.2.2. The STATE_TRY_UTOPIC state > > > > In order to give guards that have been marked as unreachable a > > chance to come back, add all entries in TRIED_GUARDS that were > > marked as unreachable more than GUARDS_RETRY_TIME minutes ago back > > to REMAINING_UTOPIC_GUARDS. > > > > I'm a bit puzzled by this mechanism. Maybe it's benefits can be explained a bit > more clearly? > > When we add guards back to REMAINING_UTOPIC_GUARDS, do we also remove them from > TRIED_GUARDS? > > Also, the value here is currently 20 minutes. So this will trigger only when > this algorithm takes over 20 minutes to terminate. I guess this should only > happen when the network is down. > > Now that we have persistent SAMPLED_UTOPIC_GUARDS is this still useful? Won't > we have fully populated our SAMPLED_*_GUARDS structures by the point this rule > triggers? > > Sorry for the confusion :) > > > Return each entry in USED_GUARDS that is not in PRIMARY_GUARDS in > > turn. For each entry, if it was not possible to connect to it, mark > > the entry as unreachable and add it to TRIED_GUARDS. > > > > Return each entry from REMAINING_UTOPIC_GUARDS using > > NEXT_BY_BANDWIDTH. For each entry, if it was not possible to connect > > to it, remove the entry from REMAINING_UTOPIC_GUARDS, mark it as > > unreachable and add it to TRIED_GUARDS. > > > > If no entries remain in REMAINING_UTOPIC_GUARDS, transition to > > STATE_TRY_DYSTOPIC. > > > > > > §2.2.3. The STATE_TRY_DYSTOPIC state > > > > In order to give guards that have been marked as unreachable a > > chance to come back, add all entries in TRIED_DYSTOPIC_GUARDS that were > > marked as unreachable more than GUARDS_RETRY_TIME minutes ago back > > to REMAINING_DYSTOPIC_GUARDS. > > > > Return each dystopic entry in USED_GUARDS that is not in PRIMARY_GUARDS in > > turn. For each entry, if it was not possible to connect to it, mark > > the entry as unreachable and add it to TRIED_DYSTOPIC_GUARDS. > > > > Return each entry from REMAINING_DYSTOPIC_GUARDS using > > NEXT_BY_BANDWIDTH. For each entry, if it was not possible to connect > > to it, remove the entry from REMAINING_DYSTOPIC_GUARDS, mark it as > > unreachable and add it to TRIED_DYSTOPIC_GUARDS. > > > > If no entries remain in REMAINING_DYSTOPIC_GUARDS, transition to > > STATE_PRIMARY_GUARDS. > > > > > > §2.2.5. ON_NEW_CONSENSUS > > > > First, ensure that all guard profiles are updated with information > > about whether they were in the newest consensus or not. If not, the > > guard is considered bad. > > > > Maybe instead of "If not" we could say "If a guard is not included in the newest > consensus" to make it a bit clearer. > > > If any PRIMARY_GUARDS have become bad, remove the guard from > > PRIMARY_GUARDS. Then ensure that PRIMARY_GUARDS contain > > N_PRIMARY_GUARDS entries by repeatedly calling NEXT_PRIMARY_GUARD. > > > > If any guards in USED_GUARDS have switched from being bad to being > > non-bad, add it back in the place it should have been in > > PRIMARY_GUARDS if it had been non-bad when populating > > PRIMARY_GUARDS. If this results in PRIMARY_GUARDS being larger than > > N_PRIMARY_GUARDS, truncate the list to be N_PRIMARY_GUARDS entries > > long. > > [XXX Does "add it back in the place it should have been in PRIMARY_GUARDS > > if it had been non-bad" implies keeping original order?] > > > > If I understand correctly, I think the answer to this XXX is "Ideally, yes.". > > I'm curious to see how this mechanism will be implemented because it's important > and it would be nice if it's done cleanly. > > Also, we should be careful about when we count 'bad' guards. After a few weeks > of operation, the USED_GUARDS list can accumulate multiple bad guards, and we > should make sure we don't count them when we do our threshold checks. > > > > > §2.3. The SHOULD_CONTINUE algorithm > > > > This algorithm takes as an argument a boolean indicating whether the > > circuit was successfully built or not. > > > > After the caller have tried to build a circuit with a returned > > guard, they should invoke SHOULD_CONTINUE to understand if the > > algorithm is finished or not. SHOULD_CONTINUE will always return > > true if the circuit failed. If the circuit succeeded, > > SHOULD_CONTINUE will always return false, unless the guard that > > succeeded was the first guard to succeed after > > INTERNET_LIKELY_DOWN_INTERVAL minutes - in that case it will set the > > state to STATE_PRIMARY_GUARDS and return true. > > > > ACK. > > Just a reminder that we also discussed adding the "Retry primary guards if we > have looped over the whole guardlist" heuristic somewhere here. Because in many > cases the network can go down and then back up in less than a minute. > > > > > §2.4. The END algorithm > > > > The goal of this algorithm is simply to make sure that we keep track > > of successful connections made. This algorithm should be invoked > > with the guard that was used to correctly set up a circuit. > > > > Once invoked, this algorithm will mark the guard as used, and make > > sure it is in USED_GUARDS. > > [XXX if the guard is not in USED_GUARDS should be added *first*? > > Important because it will affect on the building of PRIMARY_GUARDS.] > > > > IIUC, if the guard is not in USED_GUARDS it should be added *last* (that is, > with lowest priority). > > > > > §2.5. Helper algorithms > > > > These algorithms are used in the above algorithms, but have been > > separated out here in order to make the flow clearer. > > > > NEXT_PRIMARY_GUARD > > - Return the first entry from USED_GUARDS that is not in > > PRIMARY_GUARDS and that is in the most recent consensus. > > - If USED_GUARDS is empty, use NEXT_BY_BANDWIDTH with > > REMAINING_UTOPIC_GUARDS as the argument. > > > > NEXT_BY_BANDWIDTH > > - Takes G as an argument, which should be a set of guards to > > choose from. > > - Return a randomly select element from G, weighted by bandwidth. > > > > > > §3. Consensus Parameters, & Configurable Variables > > > > This proposal introduces several new parameters that ideally should > > be set in the consensus but that should also be possible to > > set or override in the client configuration file. Some of these have > > proposed values, but for others more simulation and trial needs to > > happen. > > > > PRIMARY_GUARDS_RETRY_INTERVAL > > In order to make it more likely we connect to a primary guard, > > we would like to retry the primary guards more often than other > > types of guards. This parameter controls how many minutes should > > pass before we consider retrying primary guards again. The > > proposed value is 3. > > > > GUARDS_RETRY_TIME > > In the normal course of selecting guards, we want to continue > > retrying unreachable guards in case they have become > > reachable. This parameter controls the minimum amount of minutes > > before we should start to consider guards for connecting > > again. Proposed value is 20. > > > > SAMPLE_SET_THRESHOLD > > In order to allow us to recognize dystopic situations or a > > completely unreachable network, we would like to avoid > > connecting to too many guards before switching modes. We also > > want to avoid exposing ourselves to too many nodes in a > > potentially hostile situation. This parameter, expressed as a > > fraction, determines the number of guards we should keep as the > > sampled set of the only guards we will consider connecting > > to. It will be used as a fraction for both the utopic and the > > dystopic sampled set. If we assume there are 1900 utopic guards > > and of them there are 500 dystopic guards, a setting of 0.02 > > means we will have a sample set of 38 utopic guards and 10 > > dystopic guards. This limits our total exposure. Proposed value > > is 0.02. > > > > We should decide if we want to actually use a dynamic percentage here, or just > set the threshold to a constant value. > > A dynamic percentage might give us better security and reachability as the > network evolves, but might also cause unpredictable behaviors if we suddently > get too many guards or too many of them disappear. > > I don't have a strong opinion here. > > > INTERNET_LIKELY_DOWN_INTERVAL > > The number of minutes since we started trying to find an entry > > guard before we should consider the network down and consider > > retrying primary guards before using a functioning guard > > found. Proposed value 20. > > > > It seems to me that the value 20 here could get reduced to something like 5 or > even less. Of course 5 is also an arbitrary value and to actually find out the > "best" number here we should test the algorithm ourselves in various network > types. > > Let's say we are restricting ourselves to SAMPLE_SET_THRESHOLD guards. If the > network is down, we will start cycling through guards and we will have walked > through all of them in a matter of minutes. After we have exhausted our guard > list once and we did not manage to connect, our network is effectively > down. This might give extra reasons to do the "Retry guards if we have exhausted > our guard list once" heuristic. > > > §4. Security properties and behavior under various conditions > > > > Under normal conditions, this algorithm will allow us to quickly > > connect and use guards we have used before with high likelihood of > > working. Assuming the first primary guard is reachable and in the > > consensus, this algorithm will deterministically always return that > > guard. > > > > Under dystopic conditions (when a firewall is in place that blocks > > all ports except for potentially port 80 and 443), this algorithm > > will try to connect to 2% of all guards before switching modes to try > > dystopic guards. Currently, that means trying to connect to circa 40 > > guards before getting a successful connection. If we assume a > > connection try will take maximum 10 seconds, that means it will take > > up to 6 minutes to get a working connection. > > > > When the network is completely down, we will try to connect to 2% of > > all guards plus 2% of all dystopic guards before realizing we are > > down. This means circa 50 guards tried assuming there are 1900 guards > > in the network. > > > > In terms of exposure, we will connect to a maximum of 2% of all > > guards plus 2% of all dystopic guards, or 3% of all guards, > > whichever is lower. If N is the number of guards, and k is the > > number of guards an attacker controls, that means an attacker would > > have a probability of 1-(1-(k/N)^2)^(N * 0.03) to have one of their > > guards selected before we fall back. In real terms, this means an > > attacker would need to control over 10% of all guards in order to > > have a larger than 50% chance of controlling a guard for any given client. > > > > In addition, since the sampled set changes slowly (the suggestion > > here is that guards in it expire every month) it is not possible for > > an attacker to force a connection to an entry guard that isn't > > already in the users sampled set. > > > > > > §A. Appendix: An example usage > > > > In order to clarify how this algorithm is supposed to be used, this > > pseudo code illustrates the building of a circuit: > > > > OPEN_CIRCUIT: > > context = ALGO_CHOOSE_ENTRY_GUARD_START(used_guards, [], 3, false) > > > > while True: > > entryGuard = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context) > > circuit = composeCircuitAndConnect(entryGuard) > > > > if not SHOULD_CONTINUE(isSuccessful(circuit)): > > ALGO_CHOOSE_ENTRY_GUARD_END(context, entryGuard) > > return circuit > > > > -*- coding: utf-8 -*- > -- Ola Bini (https://olabini.se) "Yields falsehood when quined" yields falsehood when quined.
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ tor-dev mailing list tor-dev@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev