On Tue, Aug 05, 2014 at 11:24:32AM -0400, Philipp Winter wrote: > On Tue, Aug 05, 2014 at 11:37:45AM +0200, Karsten Loesing wrote: > > Started looking into better algorithms to detect Sybil attacks on the > > Tor network. Current thinking is that we should define relay similarity > > metrics like common IP address prefix length or time between first seen > > in a consensus, go throw the consensus archives, and see which relays > > look similar but are not defined to be in the same family. > > Do you already have some code or more details on that? I'm quite > interested in this topic and I'm wondering if it wouldn't be best to > start with something simple like cosine similarity [0]. We would have > to transform a relay descriptor into a numerical vector and then > calculate the cosine similarity to other relay vectors. However, this > might scale poorly as we would need (n^2)/2 comparisons. > > We might also want to weigh the vector's elements differently as some of > them are easy to control for an attacker (advertised bandwidth, uptime, > flags, exit policy) and others require more effort (IP address, ASN, > observed bandwidth). Like you mentioned, the key thing to look at might > be time, i.e., uptime and derived features such as "total online time in > last month" etc. > > [0] https://en.wikipedia.org/wiki/Cosine_similarity You only need O(n*log(n)) if you can define any similarity metric that respects the triangle inequality. There's a lot of research on data structures for this: https://en.wikipedia.org/wiki/Metric_tree -- Andrea Shepard <andrea@xxxxxxxxxxxxxx> PGP fingerprint (ECC): BDF5 F867 8A52 4E4A BECF DE79 A4FF BC34 F01D D536 PGP fingerprint (RSA): 3611 95A4 0740 ED1B 7EA5 DF7E 4191 13D9 D0CF BDA5
Attachment:
pgpeFxVteusph.pgp
Description: PGP signature
_______________________________________________ tor-dev mailing list tor-dev@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev