[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[freehaven-dev] rfc: defining anonymity



i wrote this up this evening. it's rough, but it has a lot of neat
ideas in it.
can anybody point me to other literature on these issues? i don't want
to believe that i'm the first person to write this down, but i haven't
seen it elsewhere...

sentences with upper-case letters are ones that i've written with intent
to keep. :) sentences with lower-case letters are ones that i filled in
so you'd have some notion of what i'm intending to say.

comments very much appreciated!
feel free to forward this around to other people who might be interested.
thanks,
--roger



\subsection{Defining Anonymity}

So far, we've been throwing around the term `anonymous' without ever
actually specifying exactly what we mean by it.  Indeed, many of the
projects in our related works section claim an anonymous network or
some other catchphrase involving anonymity, but they generally fail
to actually specify what protections users and operators receive from
their system, as well as what protections users and operators do not
receive from their system.

In general, there are three agents in an anonymous publication or
storage system. The following sections address anonymity for each
of these agents separately.

\subsubsection{Author-anonymity}
Author-anonymity means that the original author of a given document
should not be known. This characteristic of anonymity is one of the
integral parts of pretty much any anonymous network or service.
Even so-called `anonymous remailers', which are simply anonymous
forwarders and don't support persistence or storage of the data, provide
author-anonymity. Indeed, anonymous remailers can be combined with public
storage and distribution systems such as usenet to offer a rudimentary
but very easy to construct and deploy service which allows persistently
available data and provides author-anonymity.

\subsubsection{Reader-anonymity}

Read-anonymity means that readers requesting a document
should not have to identify themselves to anyone. In particular, this means
that when a reader performs a document request at a given server, this
server is unable to determine the identity or location of the reader.

This class of anonymity is crucial for protecting people from disclosing
that they are interested in or accessing certain types of material.
For instance, a user of the system might not want it known whether she is
downloading material from the Democrats web page, or from the Republicans
web page. Reader-anonymity ensures the privacy of the vast majority of the
system's {\em users}, a concern which is often ignored.

\subsubsection{Server-anonymity}

Server-anonymity means that the location of the document should not
be known or knowable. Specifically, given a document's name or other
identifier, an adversary is no closer to knowing which server or servers
on the network currently possess this document. This implies that the
retrieved documents do not provably pass through any given server that
receives a request. This protection is crucial for materials where mere
possession presents a danger to the possessor, such as documents speaking
out against Scientology practices.

Many services rely on sheer volume of servers, each containing the data,
to dissuade organizations from attacking any given server for possessing
the data. However, this approach does prevent large corporations from
participating in these questionable networks, due to liability and
reputation concerns. Also, there may be some circumstances, such as the
opendvd suits, where adversaries are willing to expend the energy to
trace down all servers which offer a given document. Indeed, making an
example out of even a few high profile server operators can go a long
way towards reducing the availability of a document.

\subsubsection{Anonymity vs. Pseudonymity}

anonymity means that that side of the transaction really has no 'location'.
often this is characterized by a 'pull' type of operation, eg on a
bulletin-board system or some other publication medium like the web.

pseudonymity means there is a location associated with the person, even
if this location is a 'one-use' type of thing. there's one-time pseudonymity
which is that, and then there's persistent pseudonymity where you use the
same location across multiple transactions.

free haven uses pseudonymity for readers and for servers. we could
theoretically support anonymity for authors. that really isn't speced
out yet. (we should fix that.)

\subsubsection{Computational vs. Information-Theoretic Anonymity}

The above three classes of anonymity describe the issues regarding each
of the three agents in the system. However, there are some other broader
characteristics of anonymity to consider.

One of these issues is the notion of how protected a given address is:
does it rely on computational complexity to protect its anonymity (e.g.,
a reply block address on a conventional mixnet), or does it use
some other technique to make the address unknowable even in the face of
a computationally powerful adversary?

there's more to discuss and analyze here, clearly.

\subsubsection{Perfect Forward Anonymity}

perfect forward secrecy means that after a given transaction is done,
there is nothing new that the adversary can 'get' to help him decrypt
the transcript.
similarly, perfect forward anonymity is the notion that after a given
transaction is done, there is nothing new that the adversary can get
that can help him identify the location or identity of either of the
communicating parties.

this can be phrased as a game: if A is talking to some unknown party B,
can our adversary E distinguish with better than even probability between
whether A is talking to B or A is talking to C?

from here, we can define classes of adversaries such that some classes
can beat us in some circumstances, and some can't.

for instance, some adversaries might be able to watch the link between
them, some might be able to frob with the link as well. some might be
able to observe the private state of A and/or B (that is, data which is
internal to the nodes but doesn't get sent over the network). maybe some
could even modify private state.

subtle attack to watch out for: since we're talking about anonymity,
there are issues we have to address that we don't consider when we're
dealing with encryption. for instance, if B is running a server at his
company, what about the cached data in the NIDS down the hall? is that
part of B? because it can certainly help to distinguish...  what if one
of the packets in the transmission got sent off course, and 120 seconds
later an ICMP destination unreachable packet returns to B, with its
data segment an exact copy of one of the packets that was sent to A?
Is 120 seconds negligible? what isn't negligible?

if we want to address adversaries, it should probably be its own
subsubsection.

\subsubsection{Social Engineering}

bear in mind that no matter how cool your anonymity is, if you sign
your name to your document, you lose. also, more subtle attacks such
as word choice correlation or writing analysis might yield clues that
allow better than even chances of guessing. all of the above models
are based on the idea that a given document is a random distribution of
bits. [is that a strong enough requirement? is that too strong?]

\subsubsection{Analysis and Conclusion}

it sounds like there are 9 characteristics of anonymity we want from an
ideal anonymous service: basic anonymity for each of the three agents,
plus information-theoretic and perfect forward anonymity for each class
of transaction that the service supports [does that actually map well to
"for each of the three agents" also?]

most of the related works get 1 or 2 out of these 9. free haven gets 4 or
5 (basic plus some pretty good protection for authors depending on how
we implement submission), and if we switched to a scintillating mixnet
(they don't exist, but the notion is that addresses materialize and
dematerialize instantaneously, so you can have a transaction and a moment
later it's as if the tunnel you just used never existed) we could probably
get 8 (information-theoretical anonymity for servers is very tough?)

anyway, i haven't thought this far yet. but there's plenty of ideas
here.