[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [freehaven-dev] rfc: defining anonymity





Hi, 
I've trimmed Anna and Adam from the reply because this is going to be
somewhat fragmented.

On Fri, 28 Apr 2000, Roger R Dingledine wrote:


> can anybody point me to other literature on these issues? i don't want
> to believe that i'm the first person to write this down, but i haven't
> seen it elsewhere...

The TAZ/Rewebber paper by Goldberg & Wagner has a discussion of these
kinds of anonymity, but only far enough to clarify that they care only
about "author anonymity." 

Private Information Retreival focuses on what we're calling "reader
anonymity"



> \subsection{Defining Anonymity}
> 
> So far, we've been throwing around the term `anonymous' without ever
> actually specifying exactly what we mean by it.  Indeed, many of the
> projects in our related works section claim an anonymous network or
> some other catchphrase involving anonymity, but they generally fail
> to actually specify what protections users and operators receive from
> their system, as well as what protections users and operators do not
> receive from their system.
> 
> In general, there are three agents in an anonymous publication or
> storage system.

It might be nice to explicitly specify these terms here. Here's a
fragmentary stab at it. 

Storage System : A storage system stores documents for later
retreival. There are (at least) two primitives in a storage system :
	
	Insert(document)
	Retreive(document_id)

where document_id is some way of identifying a document. 

Publication System :  Do we want to make a distinction? I would expect a
"publication system" to have some means of searching, but it's not quite
clear to me what the distinction is. 

Author : Every document enters the publication system by means of 
 the Insert() primitive. The entity which originated (called??) Insert()
is referred to as the "Author" of that document.  

Reader :
	Readers wish to read documents stored in the publication system.
Readers originate Retreive(document_id) requests. 

Server : 
	Documents in the storage system physically reside on one or more
Servers. Servers receive Retreive(document_id) requests and attempt to
return the referenced document to the originating Reader.


> 
> \subsubsection{Reader-anonymity}
> 
> Read-anonymity means that readers requesting a document
> should not have to identify themselves to anyone. In particular, this means
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It's actually stronger than this, because it seems that you could have
a protocol where the reader doesn't have to identify himself, but his
identity is easy to figure out anyway. For instance, maybe the reader uses
a direct link to download something from a web server. 


> \subsubsection{Server-anonymity}
> 
> Server-anonymity means that the location of the document should not
> be known or knowable. Specifically, given a document's name or other
> identifier, an adversary is no closer to knowing which server or servers
> on the network currently possess this document. This implies that the
> retrieved documents do not provably pass through any given server that

I'd like to suggest yet another distinction :

	* an adversary's ability to _detect_ a document passing through
	or having had passed through a given server 
vs.
	* an adversary's ability to _prove to other people_ that a 
	document did indeed pass through a given server

In the first instance, the adversary can convince himself which server
holds / held the data, but cannot convince anyone else. In the second
instance, the adversary can convince other people who do not trust him or
her that the data actually passed through or is sitting on a server.

This distinction is important because while the second case will protect
node operators from lawsuits, the first seems necessary to protect against
"dirty tricks" by adversaries. That is, if the adversary believes you're
responsible for the bad data, he or she may try to blow you away without
the niceties of telling anyone else first. 

For example, suppose I am watching Roger's outgoing e-mail link, and Roger
uses an anonymous remailer and mail2news gateway with no latency. I'm also
watching soc.culture.singapore. Every time Roger uses the remailer, an
anti-state screed appears on the newsgroup almost immediately. Now, while
I can't prove Roger wrote it, and can't even prove the correlation to
someone else (I could have faked transcripts with the timing, the
encrypted messages aren't signed by Roger, etc.), I am convinced
that Roger is a political dissident.

Alternatively, the adversary may use its conviction that a server is
responsible in order to focus its resources on obtaining proof. In the
above example, I go raid Roger's house and find lots of half-completed
dissident tracts on his box. 

(Suddenly I have this notion of a nondeterministic
adversary which "guesses" a server and then tries to verify its guess by
producing a "succint certificate of guilt.")



> receives a request. This protection is crucial for materials where mere
> possession presents a danger to the possessor, such as documents speaking
> out against Scientology practices.
> 
> Many services rely on sheer volume of servers, each containing the data,
> to dissuade organizations from attacking any given server for possessing
> the data. However, this approach does prevent large corporations from
> participating in these questionable networks, due to liability and
> reputation concerns. Also, there may be some circumstances, such as the

I don't understand the transition from the first to the second sentence. 
How does a large volume of servers imply anything about the reputation of
the network? or the willingness of large corporations to participate or
not participate? 


> \subsubsection{Computational vs. Information-Theoretic Anonymity}

> One of these issues is the notion of how protected a given address is:
> does it rely on computational complexity to protect its anonymity (e.g.,
> a reply block address on a conventional mixnet), or does it use
> some other technique to make the address unknowable even in the face of
> a computationally powerful adversary?

We probably should worry about computationally unbounded adversaries, but
we also need to specify what the adversary can see and what she can't. 
Like, a computationally unbounded adversary who is _only_ allowed to 
watch a single public bulletin board may not be able to break anonymity,
while a computationally very weak adversary who can corrupt lots of
servers may be able to break anonymity easily. 

You bring this up a bit later :

> for instance, some adversaries might be able to watch the link between
> them, some might be able to frob with the link as well. some might be
> able to observe the private state of A and/or B (that is, data which is
> internal to the nodes but doesn't get sent over the network). maybe some
> could even modify private state.

How does this sound as a way of breaking it down :

	* adversary which can passively eavesdrop on the link

	* adversary which can actively tamper with the link
		- cause packets to drop entirely?

	cross
	
	* adversary with ability to passively inspect private state

	* adversary with ability to change private state and watch
	what the node does afterwards

	* adversary which has total physical control of the node
	  (this is distinct from the second one because it may
	  allow for such things as running data recovery tools on
	  the node's hard drive -- i.e. the adversary could undo
	  node erasures)	

I'm trying to think of terms for all these, but nothing feels right. 
The second set of adversaries look like some of the things which pop
up in secure multiparty computation. 

Thanks, 
-David