[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #2435 [Metrics]: Preserving hashed IP addresses in sanitized bridge descriptors
#2435: Preserving hashed IP addresses in sanitized bridge descriptors
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Metrics | Version:
Keywords: | Parent:
-------------------------+--------------------------------------------------
Roger mentioned in a comment of #2372:
> One issue that comes to mind that we might want to research is how often
a given bridge moves IP address. The method you describe above would lose
that info, yes? Whereas if we do a keyed hash of the IP address (and never
disclose the key), we could distinguish "same" from "different". I
remember we had the keyed hash design in some other sanitization context,
but I don't remember which one -- how is the idea working out in that
other context?
>
> (It's possible that we already do the keyed hash for the regular bridge
descriptors, so we would just need to match up the sha1(fingerprint) in
this file with the sha1(fingerprint) in that file and we could look up the
IP address. In which case maybe there's merit in doing the same keyed hash
in both places, to ease the job of future researchers.)
When we discussed this topic the last time, I suggested replacing bridge
IP addresses with something very similar to this:
{{{
H(IP address + bridge identity + secret)[:3]
}}}
The input IP address is the 4-byte long binary representation of the
bridge's current IP address. The bridge identity is the 20-byte long
binary representation of the bridge's long-term identity fingerprint. The
secret is an arbitrary, sufficiently long (say, 20 bytes), secure random
string that does not change over time and that is only known to the
machine running the bridge descriptor sanitizer plus backups. H is SHA-1.
The [:x] operator means that we pick the x most significant bytes of the
result.
The original transformation used 4 bytes of the output, but I changed this
to use only 3 bytes here. The idea is to write the resulting "IP
addresses" as 10.x.x.x in the sanitized descriptors to make it clear that
these are no public IP addresses. I want to avoid confusion with the non-
sanitized IP addresses in exit policies. I'm aware of the higher
collision probability, but the probability and impact of missing an IP
address change are still sufficiently low.
The resulting "IP address" helps us detect whether a specific bridge has
changed its IP address. It does not tell us if two bridges run on the
same IP address. It also does not tell us when a bridge changes its
fingerprint but keeps its IP address.
The two important pieces of this transformation are that a) someone who
learns a bridge's identity cannot guess the bridge's previous IP addresses
(which would have been possible without using the secret); b) someone who
guesses the secret cannot guess the IP addresses of all bridges (which
would have been possible without using the bridge identity).
There are more details about preserving hashed IP addresses in
[http://archives.seul.org/or/dev/Apr-2010/msg00000.html this thread].
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2435>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs