[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Sybil attack detection

To: tor-dev@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [tor-dev] Sybil attack detection
From: Mansour Moufid <mansourmoufid@xxxxxxxxx>
Date: Wed, 6 Aug 2014 14:21:48 -0400
Delivered-to: archiver@xxxxxxxx
Delivery-date: Wed, 06 Aug 2014 14:22:25 -0400
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; bh=/nv0RlMDdnOxcUVWUf7IYsVu9TA5c419yljbKR6DTro=; b=K4QvGumNnMq1Ho+OWcpXfQaqX7J0gEQbiLHdqKKl1d6ol1R1zaZ6jLLxn58/mkm7A4 InCvCnsqVkTnKoiyka2luRu718ML9GOQOGIH3rF9uTsvw/q9Q8H5XBZ+vHFf/BTckbGZ Y5LotBr19mqbawGDlQuvNreU/LMfuZgBRBEF/lFw1l7DvhTkwILarfq+shOGHknSZ3kP qviKeUYh5Wio1sPMbnpmggo6zKCe4OJiP/PReihTDF55rjVjxVMPWgkzEeoO7qE44sHp T3rEtOKpLKQuY8VfXWAiZS8vvVtmu3rWmQpyAl9IqxvUJly1RUjuRc6YyTBm8919Y+qO GjOQ==
In-reply-to: <53E0FFA0.4000903@xxxxxxxxxxxxxx>
List-archive: <http://lists.torproject.org/pipermail/tor-dev/>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
References: <53E0A5E9.5020409@xxxxxxxxxxxxxx> <20140805152432.GJ17757@xxxxxxxxx> <53E0FFA0.4000903@xxxxxxxxxxxxxx>
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-dev" <tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx>

On Tue, 05 Aug 2014 18:00:32 +0200
Karsten Loesing <karsten@xxxxxxxxxxxxxx> wrote:

> On 05/08/14 17:24, Philipp Winter wrote:
> > On Tue, Aug 05, 2014 at 11:37:45AM +0200, Karsten Loesing wrote:
> >> Started looking into better algorithms to detect Sybil attacks on the
> >> Tor network.  Current thinking is that we should define relay similarity
> >> metrics like common IP address prefix length or time between first seen
> >> in a consensus, go throw the consensus archives, and see which relays
> >> look similar but are not defined to be in the same family.
> > 
> > Do you already have some code or more details on that?
> 
> Details, yes, see below.  Code, not really, but give me a few hours
> tomorrow to clean it up and put it online.
> 
> > I'm quite
> > interested in this topic and I'm wondering if it wouldn't be best to
> > start with something simple like cosine similarity [0].  We would have
> > to transform a relay descriptor into a numerical vector and then
> > calculate the cosine similarity to other relay vectors.  However, this
> > might scale poorly as we would need (n^2)/2 comparisons.

It can be O(n^2 log n).

If you concatenate the many vectors into a single one, the many cosine
dot products are a single convolution operation, which is O(n log n)
with the FFT. Correlate the big vector (haystack) with the search one
(needle) -- indices of peaks give matches. This is cross-correlation.

(As Andrea mentioned, there are conditions. I think they're met.)

So if you want to find matches for some vector X = x1, x2, ... in
the set of vectors A, B, ...

A = a1, a2, a3, ...
B = b1, b2, ...
etc.

then let Z = a1, a2, a3, b1, b2, ... and compute xcorr(Z, X).

Hopefully the result is almost entirely noise, with a few peaks.
If there is a positive peak at some index i, then X matches the
vector corresponding to the one with index i in Z.

The most useful vectors for correlation are ones which assign zero
for "no measurement" (like inactivity) and which have a large
variance, because that will bring down false positives significantly.
Bandwidth per hour/day, for example, might be useful.

This is basically a kind of passive traffic confirmation attack. It
would be funny to use it to detect traffic confirmation attacks.

> Sounds great.  I'm good at transforming relay data into numerical
> vectors (well, .csv files), but I have little to no experience with
> the subsequent analysis.  Help much appreciated!

I could try what I described above, with the data "bandwidth over time"
for some small set of relays, two of which were known to be colluding.
(I apologize if that's already available, it's been a while since I
used the Tor metrics.)

Mansour
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

References:
- [tor-dev] Sybil attack detection (was: Karsten's July 2014)
  - From: Philipp Winter
- Re: [tor-dev] Sybil attack detection
  - From: Karsten Loesing

Prev by Author: Re: [tor-dev] DNSSEC
Next by Author: [tor-dev] Using a Tap for Tor's Homebrew recipe to increase confidence in source code authenticity
Previous by thread: Re: [tor-dev] Sybil attack detection
Next by thread: Re: [tor-dev] Sybil attack detection
Index(es):
- Author
- Thread