[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[freehaven-dev] New views on modelling anonymity

We can model anonymous publication systems as a single large principal
Ted which coordinates communication between other principals in the
network.  In our model we have m senders Alice_0 through Alice_m, and n
recipients Bob_0 through Bob_n. When an Alice sends a message to a Bob,
Ted receives the message and delivers it to the appropriate Bob. The
privacy characteristics of Ted as a communication channel define the
level of anonymity that Ted provides. (These privacy characteristics
are described in chapter 2 of my thesis: linkability; ability to reply,
persistence of this ability, privacy of this reply; content leaks;
channel leaks; persistence of speech; and authorized readers. Are there
others? I think there are others, because the above list was just intended
to describe the privacy of the 'speaker'.) In addition, we will need to
complicate this notion with other characteristics, such as reliability
(of delivery and also of *accurate* delivery), cost of using a given path,
availability and fragility of the network.

A logical step here would be to enumerate the possible Teds that we
might encounter (or at least try to describe the spectrum of Teds and
what dimensions that spectrum can be defined by), and then map a given
anonymous publishing design (eg Freenet, Free Haven) to `the nearest Ted'.

Thus if we can convince ourselves that a given anonymous publishing
design is in some sense `equivalent' to a Ted with certain privacy
characteristics, then we can more easily reason about the level of
protection provided by that design (by reasoning instead about Ted).

More formally, for each message M_i which Alice_i sends, it has some
probability distribution D_i which describes the chance of each Bob being
the recipient of the message. If we can replace Ted with a decentralized
system which provides an indistinguishable probability distribution
forall D_i forall M_i forall i [Am I being too strict?  Perhaps there's
just a subset that we care about?], then we have defined an equivalent
Ted. This may give us an easier way to differentiate between the level
of anonymity provided by various projects, because comparing Teds is
easier and more intuitive than trying to reason about the effects of
trading or caching issues directly.

Issues and questions:
* Note that the Alices and Bobs can overlap. I don't think this is a
* The notion of the distribution D_i can describe multiple recipients
  for a given message, yes? Is that a good enough model? Do there exist
  people who are actually capable of proving probability distributions
  equivalent? Because I'm not one of them...
* Do any other communications channel projects (eg crowds, mixnet, etc)
  use this probability distribution notion in their analysis? Can we
  coordinate with them to at least use their notation, and maybe also
  build off their work?

We might want to complicate the notions from protecting the privacy of
a single message transiting the system to protecting the privacy of a
group of messages coming from a group of Alices and going to a group of
Bobs. That is more realistic, since we can take advantage of quantity
to possibly foil some traffic analysis. How much does this complicate
the analysis, and how necessary is it to move from the single message
paradigm to the group paradigm? It seems intuitive to me that requiring
a Ted that can protect a single message is somehow overkill if in actual
operation there will always be groups.

Comments appreciated. This is still far from mature. I spoke with David
about a lot of this stuff on Thursday, and he has lots of comments.