[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] Adding Claudia"s changes for the 4th draft.



Update of /home2/freehaven/cvsroot/doc/mixmaster-vs-reliable
In directory moria.mit.edu:/tmp/cvs-serv27960

Modified Files:
	mixvreliable.tex 
Log Message:
Adding Claudia's changes for the 4th draft.


Index: mixvreliable.tex
===================================================================
RCS file: /home2/freehaven/cvsroot/doc/mixmaster-vs-reliable/mixvreliable.tex,v
retrieving revision 1.6
retrieving revision 1.7
diff -u -d -r1.6 -r1.7
--- mixvreliable.tex	23 Mar 2004 22:31:14 -0000	1.6
+++ mixvreliable.tex	25 Mar 2004 22:53:58 -0000	1.7
@@ -6,6 +6,7 @@
 \usepackage{latexsym}
 \usepackage{psfrag}
 \usepackage{amsfonts}
+\usepackage{picins} %ADDED
 
 \title{Comparison between two practical mix designs}  
 
@@ -21,7 +22,7 @@
 
 \begin{abstract}
 
-
+Len
 \end{abstract}
 
 
@@ -52,9 +53,10 @@
 component.  We have used as input real-life data gathered from a popular
 remailer, and simulated the behaviour of the mix.
 
-% Summary of what we have done
 
-% Roadmap of the paper
+(Len) Summary of what we have done
+
+% (Len) Roadmap of the paper
 
 \section{Mixes}
 \label{mixes}
@@ -76,7 +78,7 @@
 certain number of messages and then flushes them. Since then, variants on
 this first design have been proposed in the literature. In this paper, we
 focus on two practical mix designs, which have been implemented and are
-part of the Mixmaster remailer network~\cite{mixmaster-announce}, that has been
+part of the Mixmaster remailer network\cite{mixmaster-announce}, that has been
 providing anonymous email services since 1995.
 
 The first design is called ``Mixmaster'' (as the remailer network) because
@@ -122,7 +124,7 @@
 messages in the pool. Note that $P(n)=s/n$.
 \begin{figure}
 \begin{center}
-\epsfig{file=mixmaster.eps, width=10cm}
+\epsfig{file=mixmaster.eps, width=7cm}
 \caption{Mixmaster in the GMM}
 \label{fig-mm}
 \end{center}
@@ -303,54 +305,233 @@
 found that it does not follow a Poisson distribution, nor any other
 known distribution. 
 
-In Figure~\ref{arr-round} we show the number of messages arrived to
-the mix in every period of $15$ minutes. Figure~\ref{arr-day} shows
-the evolution of the arrivals per day. We can observe that the traffic
-arrived to the mix during the first month is much heavier than in the
-following three months. This shows that the input traffic pattern that
-gets to a mix is highly impredictable.
+% ADDED (Evelyne)
+A Poisson process is modeled by a single parameter $\lambda$ representing the
+expected amount of arrivals per (fixed) time interval. If the arrivals to a 
+mix are assumed to follow a Poisson process with an average of $\lambda$ 
+arrivals per time interval $\Delta t$ and we denote the number of arrivals
+in such a time interval by $X$, then $X$ is Poisson distributed with parameter
+$\lambda$: $X \sim \mathrm{Poiss}(\lambda)$. Important to note is that $\lambda$
+is \emph{time-independent}.
 
-Evelyne -> some smarter comments on the inputs :-)
+In our statistical analysis we first \emph{assumed} that the process of arrivals
+\emph{is} a Poisson process and we estimated the parameter $\lambda$. The latter
+was done by taking the maximum likelihood estimate given the number of arrivals 
+per time interval $\Delta t = 15$ minutes ($N=11800$). We also
+constructed a $95$\% confidence interval for this estimate. Then we
+performed a goodness-of-fit test: can we reject the hypothesis
+\begin{equation*}
+H_0: \mathrm{the\ number\ of\ arrivals\ per\ time\ interval\ } \sim\ \mathrm{Poiss}(\bar\lambda)\ ,
+\end{equation*}
+where $\bar{\lambda}$ varies over the constructed confidence
+interval. The goodness-of-fit 
+test we used is the well-known Chisquare test. Using a significance
+level of $0.01$, the null hypothesis gets rejected!
+% END (Evelyne)
 
-Figs inputs (4 or 6, depending on space)
 
-%\begin{figure}
-%    \noindent
-%    \centerline{ \epsfig{file=NumArrivalsPerHour.eps, width=6cm}
-%                 \epsfig{file=NumArrivalsPerDay.eps, width=6cm} }
-%      \hspace*{3.2cm}(a) \hspace*{5.5cm}(b)
-%    \caption{Incoming Traffic Pattern}
-%    \label{arr-day}
-% \end{figure}
+In Figure~\ref{arr-} we show the number of messages arrived to
+the mix per hour. Figure~\ref{arr-day} shows
+the evolution of the arrivals per day. We can observe that the traffic
+arrived to the mix during the first month is much heavier than in the
+following three months. This shows that the input traffic pattern that
+gets to a mix node is highly impredictable.
 
+Figure~\ref{frec-hour} shows the frecuency of hours that receive a
+certain number of arrivals. We can see that in most of the hours 
+the mix receives less than $20$ messages. Figure~\ref{frec-day} shows
+the messages arrived per day.  
 
+\begin{figure}
+    \noindent
+ \epsfig{file=hist-arr-hour.eps, width=6cm}
+ \epsfig{file=hist-arr-day.eps, width=6cm} 
+\caption{Incoming Traffic Pattern}
+ \label{frec-day}
+ \end{figure}
+\begin{figure}
+    \noindent
+ \epsfig{file=arr-hour.eps, width=5.5cm}
+ \epsfig{file=arr-days.eps, width=5.5cm} 
+\caption{Frecuency Analysis of Inputs}
+ \label{frec-day}
+ \end{figure}
 
 \subsection{Analysis of Mixmaster}
 
 We have simulated a Mixmaster node as explained in
 Section~\ref{sims}. Mixmaster is a pool mix and processes messages in
 batches. The recipient anonymity of all inputs of a round is the
-same. Equivanlently, all outputs of a round have the same sender
+same. Equivalently, all outputs of a round have the same sender
 anonymity value. In this section we show the results obtained in our
 simulation. 
 
-Figure~\ref{3d-rec} displays a point per message, located in the space
-defined by the delay (amount of time the message spends on the mix),
-arrivals (number of inputs received by the mix together with the
-message) and recipient anonymity of the message.
+In Figure~\ref{3d-rec} we draw a point per round\footnote{In pool mixes all
+messages of the same incoming round have the same recipient anonymity,
+and all the messages of the same outgoing round have the same sender
+anonymity}, located in the space defined by the delay (amount of time
+the message spends on the mix), arrivals (number of inputs received by
+the mix together with the message) and recipient anonymity of the
+message. Figure~\ref{3d-sen} shows the same for sender anonymity.
 
-... FINISH Claudia
+\begin{figure}
+\begin{center}
+\epsfig{file=3Dplot_Recip.eps, width=8cm}
+\caption{Recipient anonymity for Mixmaster}
+\label{3d-rec}
+\end{center}
+\end{figure}
 
-%shows the recipient anonymity provided by the mix as a  
+\begin{figure}
+\begin{center}
+\epsfig{file=3Dplot_Sender.eps, width=8cm}
+\caption{Sender anonymity for Mixmaster}
+\label{3d-sen}
+\end{center}
+\end{figure}
+
+The first conclusion we come to when observing the figures is that
+there is a lower bound to the anonymity of Mixmaster. It is worth
+noting that, so far, we do not know any theoretical analysis of pool
+mixes able to predict the anonymity it provides, and prior to this
+analysis there were no figures on the anonymity that Mixmaster was
+actually providing. With this simulation,
+we can clearly see that Mixmaster guarantees a minimum sender and
+recipient anonymity of about $7$. This means that the sender
+(recipient) of a message gets a minimum anonymity equivalent to perfect
+indistinguishability among $2^7=128$ senders (recipients). 
+
+We can see that the minimum anonymity is provided when the traffic
+(arrivals) is low. As the traffic increases, anonymity increases,
+getting maximum values of about $10$ (i.e., equivalent to perfect
+indistiguishability among $2^10=1024$) senders or recipients. 
+We also observe that the delay of the messages doesn't take hight 
+values, unless the traffic load getting to the mix is very low.
+
+In order to study the behaviour of the mix under different traffic loads,
+we have plotted values of delay and anonymity obtained in the simulation 
+for the rounds with  few arrivals (low traffic), intermediate number of 
+arrivals (medium traffic) and many arrivals (high traffic).
+
+We have selected the low, medium and high traffic taking into account 
+the data statistics of the arrival process:
+\begin{description}
+\item[Low traffic:] all rounds where the number of arrivals was 
+between the first and third quantile; hence $50$ percent of the rounds
+are denoted as normal traffic.
+\item[Medium traffic:] all rounds where the number of arrivals was greater
+than the third quantile but lower than the outlier bound.
+\item[High traffic:] all rounds with outlier values for the incoming
+messages.
+\end{description}
+
+In Figure~\ref{del-mm} we show the minutes of delay of every message. 
+We can see that the delay only takes high values when the traffic 
+is low. The fact that some messages appear as having a delay close 
+to zero in the low traffic figure is due to the fact that we have more 
+samples, so there are messages that arrive just before the flushing and 
+are forwarded immediately. In Figure~\ref{an-mm} we shows the recipient 
+anonymity of every message (the sender anonymity presents very similar
+characteristics). We can see that as the traffic increases, the anonymity 
+provided to the messages takes higher values. 
+
+ 
+\begin{figure}
+\begin{center}
+\epsfig{file=mm-del.eps, width=10cm}
+\caption{Delay values for Mixmaster}
+\label{del-mm}
+\end{center}
+\end{figure}
+
+\begin{figure}
+\begin{center}
+\epsfig{file=mm-sanon.eps, width=10cm}
+\caption{Anonymity values for Mixmaster}
+\label{an-mm}
+\end{center}
+\end{figure}
 
 
 \subsection{Analysis of Reliable}
 
-Claudia
+
+The theoretical method proposed in \cite{stop-and-go} that gives a 
+probabilistic prediction on the anonymity provided by Reliable is 
+based on the assumption of Poisson traffic. As we have seen, this assumption
+is definetly not correct for mix traffic. 
+
+We have simulated a Reliable mix as explained in
+Section~\ref{sims}. Reliable is a continuous (stop-and-go) mix.
+Reliable treats every message independently: when it gets a message
+it delays it a pre-determined amount of time (picked from an exponential
+distribution) and then forwards it. We represent a star, '*', per message.
+
+In Figure~\ref{rel-sen} we present the sender and reciipient anonymity
+provided by Reliable 
+for the real stream of inputs we have considered.
+We can see that the
+anonymity takes minimum values close to zero, which means that some of
+the messages can be trivially traced by a passive attacker. The
+maximum values of Reliable's anonymity for this input are lower  
+than Mixmaster's maximums.
+Figure~\ref{sen-rec} shows the correlation of sender and recipient
+anonymity. These values are highly correlated (as in the case of
+Mixmaster). We can clearly see that some of the messages get nearly
+no anonymity. 
+
+\begin{figure}
+    \noindent
+    \epsfig{file=rel-sen.eps, width=7cm}
+    \epsfig{file=rel-rec.eps, width=7cm}
+    %\newline
+    \label{rel-sen}
+\end{figure}
+
+\begin{figure}
+\epsfig{file=sen-rec.eps, width=7cm}
+%\caption{Correlation Sender-Recipient Anonymity for Reliable}
+\label{sen-rec}
+
+\end{figure}
+
 
 \subsection{Mixmaster vs. Reliable}
 
-Claudia
+As we have shown in the previous two sections, Mixmaster and Reliable
+have very different behaviours for the same traffic stream. Note that
+we have modified the default ($1$ hour) mean delay of Reliable in
+order to make a fair comparison between the two mixes (so the average
+delay of all messages is the same for the two mixes). 
+
+Mixmaster priorizes the anonymity over the delay, and it provides a
+minimum recipient (sender) 
+anonimity of around $7$, equivalent to perfect undistinguisability
+among $2^7=128$ input (output) messages. When the traffic load
+decreases, Mixmaster provides a larger latency in order to keep the
+anonymity in high levels. 
+
+Reliable delays messages according to an exponential distribution,
+regardless of the traffic load. This has an effect in the anonymity,
+that will only have high values when there is a high traffic
+load. When the traffic load decreases, the anonymity provided by
+Reliable goes down to very low values. In some cases of very low load,
+Reliable does not provide anonymity at all.
+
+Our conclusion is that a continuous mix like Reliable is not
+appropriate to provide anonymous services for applications that do not
+have real-time requirements (like email). A pool mix like Mixmaster
+should be used instead. 
+
+Continuous mixes like Reliable may be useful for real-time
+applications with tight delay constraints (like web
+browsing). Nevertheless, in order to provide acceptable levels of
+anonymity, the traffic load should be kept high.
+
+
+
+
+
 
 
 \section{Other factors which influence anonymity}
@@ -587,9 +768,33 @@
 countermeasures to this class of active attacks should be
 taken.~\footnote{rgb-mix}
 
+
 \section{Conclusions and future work}
 \label{conclusions}
 
+%(... Len, here's the input, can you phrase it nicely?)
+
+We have analyzed the traffic pattern of a real traffic stream going
+through a working mix node and found that the traffic is not Poisson,
+as it is commonly assumed in the literature. 
+
+The traffic pattern is highly impredictable. Therefore, no assumptions
+on the traffic should be made when designing a mix.
+
+Mixmaster provides a minimum anonymity; Reliable does not. Reliable's
+amnonymity drops to nearly zero if the traffic is very low. 
+
+Mixmaster provides a higher maximum of anonymity than Reliable for the
+same stream of input. $10.5$ of Mixmaster versus $10$ of Reliable.
+
+Mixmaster provides higher average anonymity than Reliable for the same
+input and same average delay.
+
+Mixmaster provides higher delays than Reliable in low traffic
+conditions. Reliable's delay is not dependent on the traffic.
+
+...also look at section ``Mixmaster vs Reliable'' to get more
+conclusions. 
 
 \subsection*{Acknowledgments}
 
@@ -600,6 +805,19 @@
 Concerted Research Action (GOA) Mefisto-2000/06 of the Flemish
 Government. 
 
+Evelyne Dewitte is a research assistant with the I.W.T. 
+(Flemish Institute for Scientific and Technological Research in Industry).
+Research supported by Research Council KUL: GOA-Mefisto 666, several PhD/postdoc \& fellow grants;
+Flemish Government:
+FWO: PhD/postdoc grants, projects, G.0240.99 (multilinear algebra), {G.0407.02 (support vector machines)}, G.0197.02 (power islands), 
+G.0141.03 (identification and cryptography), G.0491.03 (control for intensive care glycemia), G.0120.03 (QIT), research communities (ICCoS, ANMMM);
+AWI: Bil. Int. Collaboration Hungary/ Poland;
+IWT: PhD Grants, Soft4s (softsensors),
+Belgian Federal Government: DWTC (IUAP IV-02 (1996-2001) and IUAP V-22 (2002-2006)), PODO-II (CP/40: TMS and Sustainability);
+EU: CAGE; ERNSI; Eureka 2063-IMPACT; Eureka 2419-FliTE;
+Contract Research/agreements: Data4s, Electrabel, Elia, LMS, IPCOS, VIB.
+
+
 The authors also want to thank Jasper Scholten for looking at the
 feasibility of some simulation algorithms, Peter Palfrader for assisting
 with the gathering of input data for our simulations, and members of The
@@ -611,11 +829,115 @@
 \section{Method to compute the anonymity of Reliable} 
 \label{form-reliable}
 
+% BEGIN ADDITION EVELYNE
+To formalize the behaviour of the mixes, we define:
+\begin{itemize}
+\item
+$X_{s}$ : an incoming message arriving at time $s$;
+\item
+$Y_{t}$ : an outgoing message leaving at time $t$;
+\item
+$D$ : the amount of time a message has been delayed.
+\end{itemize}
+We know that the mixes delay the messages exponentially and we have set the 
+mean to $43$ minutes: $D \sim \exp(1/43)$:
+\begin{eqnarray*}
+\mathrm{pdf:\ } f(d) &=& \frac{1}{43} e^{- \frac{1}{43} d} \qquad \mathrm{\ for\ all\ }d \geq 0\ ; \\[3pt]
+                     &=& 0 \qquad \mathrm{elsewhere} \ ;\\[3pt]
+\mathrm{cdf:\ } F(d) &=& P(D \leq d) = 1-e^{- \frac{1}{43} d} \qquad \mathrm{\ for\ all\ }d \geq 0\ ; \\[3pt]
+                     &=& 0 \qquad \mathrm{elsewhere} \ .
+\end{eqnarray*}
+
+All delay times are independent.\\
 
-Evelyne
+Crucial to note in this setup is that the sequence of outgoing messages is not a Poisson process. 
+This would only be true if all inputs would arrive at the same time, hence belong
+to the mix when the delaying starts. But in our case, messages arrive at distinct moments in 
+time, each being delayed upon their arrival times.\\
 
-+ Claudia on uniform
-%METRICS RELIABLE, UNIFORM DELAY NOT POSSIBLE TO COMPUTE ANON
+Mixes flush at fixed time moments which are observed by the attacker: 
+\[ t \in \{ \mathrm{out}_{1},\ \mathrm{out}_{2},\ \ldots,\ \mathrm{out}_{M} \}.  \] He also observes
+the arrival times: \[  s \in \{ \mathrm{in}_{1},\ \mathrm{in}_{2},\ \ldots,\ \mathrm{in}_{N} \}.  \]
+
+If a message leaves the mix at time $t$, what are then the probabilities for the arrival times? Suppose the
+departure time $t=$\emph{out} is fixed. We then look for the probability that the message that left at time 
+\emph{out} is the same message as the one that entered the mix at time $s$:
+
+\begin{equation*}
+P( Y_{\mathrm{out}} = X_s ) = P( D = \mathrm{out} - s )\ .
+\end{equation*}
+
+We can hence rephrase the problem in terms of the delay: which values for the delay times are the most probable? 
+Clearly, negative delay is impossible so only arrival times prior to \emph{out} are probable. These arrival times
+form a set $\{ \mathrm{in}_{1},\ \mathrm{in}_{2},\ \ldots,\ \mathrm{in}_{k}  \}$ with $\mathrm{in}_{k} <$ \emph{out}. The 
+matching delay times are then \{ \emph{out}-$\mathrm{in}_{1}$, \emph{out}-$\mathrm{in}_{2}$,\ldots, 
+\emph{out}-$\mathrm{in}_{k}$  \} to which we will refer to as $ \{ d_1, d_2, \ldots, d_k \}$. Note that $d_1 > 
+d_2 > \ldots > d_k$. We are almost at
+the solution as the density function of the delay times is known! Caution has to be taken however as the exponential
+function is a continuous function which means that the probability of the delay taking a single value is zero:
+$P( D=d_1 ) = \ldots = P( D=d_k ) = 0$!\\
+
+\begin{center}
+\includegraphics[height=5cm]{delaytimes23.eps}
+\caption{\small An example of an exponential probability density function }   
+\end{center}
+\begin{center}
+\includegraphics[height=5cm]{expCDFsubplots.eps}
+\caption{\small The matching exponential cumulative density function }   
+\end{center}
+
+How can we then calculate the probabilities of the delay times? To make this clear, let us look at figure $1$ and suppose 
+that we only have three arrival times prior to \emph{out}. We have thus three possible delays $d_1 > d_2 > d_3$. Let us now 
+assume for simplicity reasons that $d_1=3$ hours, $d_2=2$ hours and $d_3=1$ hour. 
+The variable delay is continuous and can theoretically take every value in the interval $[0,3]$. However, we know that 
+we only flush at three particular times and that hence only three particular delays can occur. We can exploit this knowledge 
+in the following way:
+\begin{eqnarray*}
+P(D = d_1) &\approx& P(d_2 < D \leq d_1) = \mathrm{\ yellow\ surface}\ ;  \\
+P(D = d_2) &\approx& P(d_3 < D \leq d_2) = \mathrm{\ green\ surface}\ ;  \\
+P(D = d_3) &\approx& P(0 < D \leq d_3) = \mathrm{\ blue\ surface}\ .  
+\end{eqnarray*}  
+In this way one can clearly see that the biggest surface corresponds to the most probable delay! This is straightforward for
+more than three delays. For computation we make use of the cumulative distribution function (cdf) which is graphed in figure $2$. 
+Cumulative probabilities are listed in tables and known in statistical software. For reasons of simplicity we put the mean
+of the exponential to be $1$ hour (easy parametrization):
+\begin{eqnarray*}
+P(D = d_1) &\approx& F(d_1)-F(d_2) = 0.9502 - 0.8647 = 0.0855\ ;  \\
+P(D = d_2) &\approx& F(d_2)-F(d_1) = 0.8647 - 0.6321 = 0.2326\ ;  \\
+P(D = d_3) &\approx& F(d_3) = 0.6321\ .  
+\end{eqnarray*}  
+In our little example, the message corresponds most likely with the one that entered the mix $1$ hour before \emph{out}.
+You can also clearly see this on figure $1$. In practical applications however, many possible delays will occur so that
+visual inspections will not be efficient and calculations have to made and compared.
+% END ADDITION EVELYNE
+
+\subsection{Uniform Delays}
+
+Reliable allows for mix-chosen uniform delays if the users do not
+specify any delay for their messages. 
+
+We have found a method to compute the anonymity provided by a mix that
+delays inputs uniformly from a distribution $U[a,b]$. The method
+consists in creating a tables with all inputs and outputs. Then we
+search for all possible combinations input-output that are possible
+from an external observer's point of view (i.e., those that assign to
+every input that arrives at time $T$ an output that leaves between
+$T+a$ and $T+b$). Let us call the total number of combinations $C$. 
+
+Then, in order to compute the recipient (sender) anonymity of message
+$m_i$, we need to find the distribution of probabilities that link
+this input (output) to all outputs (inputs).
+
+If input $m_i$ appears matching output $s_j$ in $P$ cases, then the
+probability assigned to $s_j$ is $P/C$.
+
+The probability of an input of matching an
+output is computed as possible cases divided by total cases. 
+From this distribution, the sender and recipient anonymity can be
+computed for every message. 
+
+Unfortunately, due to the large amount of messages considered, the
+implementation of this algorithm in our case is not feasible. 
 
 
 \bibliography{Anon}

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/