[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] r1782: Re-word large portions based on Bram's comments, and the abi (doc/trunk/pynchon-gate)



Author: rabbi
Date: 2007-05-07 23:02:57 -0400 (Mon, 07 May 2007)
New Revision: 1782

Modified:
   doc/trunk/pynchon-gate/byzantine-fix.tex
Log:
Re-word large portions based on Bram's comments, and the ability to submit 
non-anonymously. Re-order flow of the paper. Merge back in the actual copy 
submitted to PET -- oops.

 --This line, and those below, will be ignored--

M    pynchon-gate/byzantine-fix.tex


Modified: doc/trunk/pynchon-gate/byzantine-fix.tex
===================================================================
--- doc/trunk/pynchon-gate/byzantine-fix.tex	2007-05-07 23:21:13 UTC (rev 1781)
+++ doc/trunk/pynchon-gate/byzantine-fix.tex	2007-05-08 03:02:57 UTC (rev 1782)
@@ -3,16 +3,41 @@
 
 \usepackage{url}
 %\usepackage{graphics}
+\usepackage{amssymb}
 \usepackage{amsmath}
 \usepackage{subfigure}
 \usepackage{epsfig}
 
+% Add \contrib{} from /usr/local/teTeX/share/texmf.tetex/amslatex/tex/latex/amscls/amsproc.cls
+
+\newcommand\V[1]{\overrightarrow{#1}}
+
+
+%\renewcommand\url{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
+%\newcommand\emailaddr{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\U
+%\newcommand\XXXX[1]{{\small\bf [XXXX #1]}}
+
+%\newcommand{\workingnote}[1]{}        % The version that hides the note.
+%%\newcommand{\workingnote}[1]{(**#1)}   % The version that makes the note visibl
+
+%\hyphenation{mix-net mix-nets}
+
+
+% Cut down on whitespace above and below figures displayed at head/foot of
+% page.
+\setlength{\textfloatsep}{3mm}
+% Cut down on whitespace above and below figures displayed in middle of page
+\setlength{\intextsep}{3mm}
+
+
 \setcounter{page}{1}
 
 \begin{document}
 
+
+
 \title{Solving the Byzantine Postman Problem}
-\subtitle{Improving the Robustness of Private Information Retrieval Without Loss of Security}
+%\subtitle{Improving the Robustness of Private Information Retrieval Without Loss of Security}
 
 \author{Len Sassaman\inst{1} \and Bart Preneel\inst{1}}
 \institute{K. U. Leuven ESAT-COSIC \\
@@ -21,18 +46,20 @@
 
 
 \maketitle
+%\pagestyle{empty}
+%\centerline{\LARGE\bf DRAFT --- not for publication}
+%======================================================================
+
 \begin{abstract}
 
 Over the last several decades, there have been numerous systems proposed which aim to preserve the anonymity of the recipient of some data. Some have involved trusted third-parties or trusted hardware; others have been constructed on top of link-layer anonymity systems or mix networks. 
 
-In this paper, we examine the Pynchon Gate~\cite{pynchon}, a pseudonymous message system which takes an alternate approach to this problem by using Private Information Retrieval (PIR) as the basis for its pseudonymity properties. We restrict our examination to a flaw in the Pynchon Gate system first described in our technical report~\cite{cosic-2007-001}; as it was originally presented, the Pynchon Gate detects the presence of (and protects against certain attacks by) Byzantine servers operating in the system, but it fails to identify \emph{which} server or set of servers is Byzantine, thus opening the door for denial of service attacks as well as other potential anonymity compromises by Byzantine servers. We evaluate a solution to this problem recently proposed by Goldberg~\cite{goldberg-2007}, and critique the security and performance trade-offs made in~\cite{goldberg-2007} as compared to the original (but flawed) Pynchon Gate Protocol (PynGP)~\cite{pynchon-spec} presented in the first Pynchon Gate paper, and find Goldberg's solution to be inadequate for several reasons. 
+In this paper, we examine the Pynchon Gate~\cite{pynchon}, a pseudonymous message system which takes an alternate approach to this problem by using Private Information Retrieval (PIR) as the basis for its pseudonymity properties. We restrict our examination to a flaw in the Pynchon Gate system first described in our technical report~\cite{cosic-2007-001}; as it was originally presented, the Pynchon Gate detects the presence of (and protects against certain attacks by) Byzantine servers operating in the system, but it fails to identify \emph{which} server or set of servers is Byzantine, thus opening the door for denial of service attacks as well as other potential anonymity compromises by Byzantine servers. 
 
-We then show a trivial modification to the original PynGP which allows for detection and identification of Byzantine nodes, with no weakening of the security model necessary, at the relatively affordable cost of greater bandwidth requirements during certain communication operations. We demonstrate that this adequately solves the problems raised by~\cite{cosic-2007-001}, and argue that it is the most suitable method of addressing the attack in question yet proposed.
+We show a trivial modification to the original PynGP which allows for detection and identification of Byzantine nodes, with no weakening of the security model necessary, at the relatively affordable cost of greater bandwidth requirements during certain communication operations. We demonstrate that this adequately solves the problems raised by~\cite{cosic-2007-001}, and argue that it is the most suitable method of addressing the attack in question yet proposed.
 
-%We then show that certain PIR algorithms, such as that proposed by Goldberg and previously by Beimel and Stahl~\cite{beimel-robust}, are unsuitable in their present form for use as the basis of a pseudonymity system, for they are vulnerable to a class of attacks known as \emph{tagging attacks}. We demonstrate that both the original and the revised PynGP is secure against these attacks.
+We then evaluate an alternate approach to solving to the problem described in~\cite{cosic-2007-001}, proposed by Goldberg in his recent paper~\cite{goldberg-2007}. We compare the security and performance trade-offs made in that proposal, and find it less secure against anonymity attacks as compared to the original (but flawed) Pynchon Gate Protocol (PynGP)~\cite{pynchon-spec} presented in the first Pynchon Gate paper. We show that this proposal is significantly weaker than the solution offered in this paper, which retains the security properties of the original Pynchon Gate Protocol. We then examine a flaw in the novel protocol presented in~\cite{goldberg-2007} that facilitates partitioning attacks by an adversary operating two or more of the user-selected nodes.
 
-%Lastly, we suggest an examination of other PIR protocols as to their suitability for use in anonymity systems, based on the additional security requirements inherent to anonymity systems compared to the other uses more commonly proposed in the literature.
-
 \end{abstract}
 
 \section{Introduction} 
@@ -46,9 +73,6 @@
 
 In distributed-trust information-theoretic PIR systems, the privacy of the system is predicated upon no single entity being able to gain access to sensitive data, be it the private information itself, or second-order information which can be used to obtain information about the private information. Of particular concern are the possibilities that a single adversary may operate multiple nodes under different identities, effectively ensuring node collusion~\cite{sybil}, or that significant amounts of the anonymity infrastructure may lack good \emph{location independence}~\cite{feamster:wpes2004}. Thus, systems which encourage participation by many unaffiliated operators of diverse backgrounds across a wide range of network providers can provide stronger services than those in which infrastructure operation is more tightly restricted. Rather than attempting to ensure that the adversary is unable to gain control of any part of the infrastructure, this \emph{laissez-faire} approach to anonymity service operation taken by some of the more successful anonymity services~\cite{tor-design,mixmaster-spec} simply accepts that some nodes will be controlled by an adversary, and accounts for this fact in the design of these systems.
 
-
-%In distributed-trust information-theoretic PIR systems, the privacy of the system is predicated upon no single entity being able to gain access to sensitive data, be it the private information itself, or a second-; of particular concern are the possibilities that a single adversary may operate multiple nodes under different identities, effectively ensuring node collusion~\cite{sybil}, or that significant amounts of the anonymity infrastructure may be lack good \emph{location independence}~\cite{feamster:wpes2004}. Thus, systems which encourage participation by many unaffiliated operators of diverse backgrounds across a wide range of network providers may provide stronger services than those in which infrastructure operation is more tightly restricted. Rather than attempt to ensure that the adversary is unable to gain control of any part of the infrastructure, this \emph{laissez-faire} approach to anonymity service operation taken by some of the more successful anonymity services~\cite{tor-design,mixmaster-spec} simply accepts that some nodes will be controlled by an adversary, and accounts for this fact in the design of these systems.
-
 \subsection{Background on Nym Servers}  % move this higher perhaps?
 Pseudonymous messaging services allow users to send
 messages that originate at a pseudonymous address (or ``nym'') unlinked to
@@ -60,16 +84,15 @@
 
 \subsection{The Pynchon Gate and The Byzantine Postman Problem}
 The most recent proposal for a nym server based on PIR with information-theoretic security, the Pynchon Gate~\cite{pynchon}, offers greater robustness, stronger anonymity assurances, and better traffic analysis resistance than previously proposed pseudonym systems. 
-However, as we have previously noted~\cite{cosic-2007-001}, it contains a flaw in its protocol which can be used to launch a denial of service attack against the system, rendering it unusable.\footnote{If one or more of the servers in the system is Byzantine, the protocol will detect that the results of the PIR request are not correct, and will treat the results as poisoned, thus eliminating potential direct attacks by Byzantine servers against the \emph{privacy properties} of the system. However, the specific PIR protocol used in the Pynchon Gate is designed in such a way that it is impossible to know which server was Byzantine, and therefore Byzantine servers can act with impunity, inevitably locking every user of the system into a state of constant denial of service.} Furthermore, the attack is not merely limited to decreased utility of the system; due to the network-effects properties of anonymity systems, denying service to one set of users can effectively weaken the anonymity provided to a different set of users~\cite{econymics}. This paper evaluates an independent attempt to address this problem~\cite{goldberg-2007}, and proposes an alternative solution. 
+However, as we have previously noted~\cite{cosic-2007-001}, it contains a flaw in its protocol which can be used to launch a denial of service attack against the system, rendering it unusable.\footnote{If one or more of the servers in the system is Byzantine, the protocol will detect that the results of the PIR request are not correct, and will treat the results as poisoned, thus eliminating potential direct attacks by Byzantine servers against the \emph{privacy properties} of the system. However, the specific PIR protocol used in the Pynchon Gate is designed in such a way that it is impossible to know which server was Byzantine, and therefore Byzantine servers can act with impunity, inevitably locking every user of the system into a state of constant denial of service.} Furthermore, the attack is not merely limited to decreased utility of the system; due to the network-effects properties of anonymity systems, denying service to one set of users can effectively weaken the anonymity provided to a different set of users~\cite{econymics}. 
 
-
 \section{Background on The Pynchon Gate}
-To address the reliability problems of silent node failure, as well as the serious security problems of statistical disclosure attacks~\cite{raymond00,statistical-disclosure,DanSer04} and end-to-end traffic analysis~\cite{e2e-traffic}, Sassaman et al. propose a complete architectural design of a PIR-based pseudonym service offering information-theoretic protection, called the Pynchon Gate~\cite{pynchon}.
+To address the reliability problems of silent node failure, as well as the serious security problems of statistical disclosure attacks~\cite{raymond00,statistical-disclosure,DanSer04} and end-to-end traffic analysis~\cite{e2e-traffic}, Sassaman, Cohen, and Mathewson propose a complete architectural design of a PIR-based pseudonym service offering information-theoretic protection, called the Pynchon Gate~\cite{pynchon}.
 
 \subsection{Architecture Overview}
-The architecture consists of an Internet-facing component referred to as the ``nym server'', which receives messages addressed to users of the system and acts as a gateway between the pseudonym service and other Internet services such as email. Behind the nym server is a component known as the ``collator'', which structures the incoming messages in the form of a three-level hash tree, which is then replicated to a series of mutually untrusted distribution databases referred to as ``distributors''. 
+The architecture of the Pynchon Gate consists of an Internet-facing component referred to as the ``nym server'', which receives messages addressed to users of the system and acts as a gateway between the pseudonym service and other Internet services such as email. Behind the nym server is a component known as the ``collator'', which structures the incoming messages in the form of a three-level hash tree, which is then replicated to a series of mutually untrusted distribution databases referred to as ``distributors''. 
 
-Email addressed to a specific pseudonym is stored in a specific location in the database, such that the owner of the pseudonym knows what information to request to obtain his message. Using the PIR protocol described in Section~\ref{subsec:PIR}, the user submits a PIR query to $\ell$ distributors, and his message is returned with none of the distributors able to deduce any information about the user's query unless all $\ell$ distributors collude. 
+Email addressed to a specific pseudonym is stored in a specific location in the database, such that the owner of the pseudonym knows what information to request to obtain his message. Using the PIR protocol described in Section~\ref{subsec:PIR}, the user submits a PIR query to $\ell$ distributors, and his message is returned with none of the distributors able to deduce any information about the user's query unless all $\ell$ distributors collude. This form of PIR is referred to as an \textbf{information-theoretic ($\ell-1$)-private $\ell$-server PIR} protocol.
 
 \subsection {The Pynchon Gate PIR Protocol}
 \label{subsec:PIR}
@@ -80,17 +103,15 @@
 to prevent man-in-the-middle attacks, and can be made sequentially or in
 parallel.
 
-%FIXME \vec exclusive-or
 The client sends a different ``random-looking'' bit vector
-${\nu}_{s\beta}$ to each distributor $s$ for each message block $\beta$ to be retrieved.  Each
+$\vec{{\nu}}_{s\beta}$ to each distributor $s$ for each message block $\beta$ to be retrieved.  Each
 bit vector has a length equal to the number of message blocks in the database.  Each
 distributor $s$ then
-computes $R(\nu_{s\beta})$ as the exclusive-\textsc{or} of all message blocks whose positions are set to $1$
-in $\nu_{s\beta}$.  The resulting value is then returned to the client.
+computes $R(\vec{\nu}_{s\beta})$ as the exclusive-\textsc{or} of all message blocks whose positions are set to $1$
+in $\vec{\nu}_{s\beta}$.  The resulting value is then returned to the client.
 
 Thus, in order to retrieve the $\beta$'th message block, the client need only choose
-the values of $\nu_{s\beta}$ such that when all $\nu_{s\beta}$ are \textsc{xor}ed together, all values are $0$ at every position except $\beta$.  (For security, $\ell-1$ of the vectors should be generated
-randomly, and the bit vectors should be sent in a random order so that the $\ell$'th, specially-crafted vector cannot be identified as such.) When the client receives the corresponding $R(\nu_{s\beta})$ values, she
+the values of $\vec{\nu}_{s\beta}$ such that when all $\vec{\nu}_{s\beta}$ are \textsc{xor}ed together, all values are $0$ at every position except $\beta$.  (For security, $\ell-1$ of the vectors should be generated randomly, and the bit vectors should be sent in a random order so that the $\ell$'th, specially-crafted vector cannot be distinguished.) When the client receives the corresponding $R(\vec{\nu}_{s\beta})$ values, she
 can \textsc{xor} them to compute the message block's contents.
 %need only choose the values of v_{sB} such that when all v_{sB} are \textsc{xor}ed together, all values are 0 at every position except B.
 
@@ -110,24 +131,11 @@
 
 \section{Byzantine Server Detection}
 
-Ideally, there would exist a way to identify a Byzantine server without modifying the existing threat model or positive security properties of the Pynchon Gate. This is a challenging problem to solve with the existing \textsc{xor}-based PIR protocol, which makes verifying the results of a PIR query returned by a specific distributor impossible. (The client does not know what a ``correct'' response $R(\nu_{s\beta})$ from any given distributor should look like; only that $$R(\nu_{s_1\beta}) \oplus R(\nu_{s_2\beta}) \oplus \cdots \oplus R(\nu_{s_\ell\beta}) = \beta'th~\text{message block}$$ and thus cannot identify which of the responses were invalid.)
+Ideally, there would exist a way to identify an individual Byzantine server without modifying the existing threat model or positive security properties of the Pynchon Gate. This is a challenging problem to solve with the existing \textsc{xor}-based PIR protocol, which makes verifying the results of a PIR query returned by a specific distributor impossible. (The client does not know what a ``correct'' response $R(\vec{\nu}_{s\beta})$ from any given distributor should look like; only that $$R(\vec{\nu}_{s_1\beta}) \oplus R(\vec{\nu}_{s_2\beta}) \oplus \cdots \oplus R(\vec{\nu}_{s_\ell\beta}) = \beta'th~\text{message block}$$ and thus cannot identify which of the responses were invalid.)
 
-\subsection{An Attempt to Improve the Robustness of the Pynchon Gate}
-
-In his recent paper~\cite{goldberg-2007}, Goldberg suggests that detection of Byzantine servers in the Pynchon Gate should be addressed using a PIR protocol with a different underlying primitive than the one originally proposed. As a replacement to the \textsc{xor}-based Pynchon Gate Protocol (PynGP), he offers a novel Byzantine-robust PIR protocol based on Shamir secret sharing~\cite{shamir-secret}, similar to that proposed by Beimel and Stahl~\cite{beimel-robust} in that only $k$ out of a total $\ell$ servers in the network need to respond, and no server coalition of up to $t$ servers can learn anything about the query. Furthermore, as in the protocol presented by Beimel and Stahl, Goldberg's protocol offers \emph{Byzantine robustness}; some servers may respond with incorrect answers to a given query, and the user can still reconstruct the proper results of the query. The protocols of interest for our problem presented in~\cite{beimel-robust} and~\cite{goldberg-2007} are termed \textbf{$t$-private $v$-Byzantine-robust $k$-out-of-$\ell$ PIR}.\footnote{In discussion of this prior work, we limit our focus to the PIR protocols suggested in~\cite{goldberg-2007} as suitable alternatives to the protocol described in~\cite{pynchon,pynchon-spec}. \cite{beimel-robust} presents several different varieties of PIR protocols in addition to the type that~\cite{goldberg-2007} has proposed for use with the Pynchon Gate, and we will not address them here. \cite{goldberg-2007} presents an extension to its primary PIR protocol such that, if this extension is used, no coalition of up to \emph{$\tau$ servers} can know the contents of the database distributed among them. This property, termed \textbf{$\tau$-independent PIR} by Gertner et al.~\cite{tau-indy}, is not required in the case of the Pynchon Gate, for the security of the system is not predicated upon the contents of the database being kept a secret, as anyone may request any message block contained in the database. Secrecy of the database is instead handled by an encryption protocol at the nym-server and collator components; if a user requests a message block for which he does not hold the correct short-term shared secret, he will be unable to decrypt and obtain its contents. Similarly, distributors do not hold any of the shared secrets or derived subkeys necessary for decryption of the message block contents; therefore, collusion among distributors is irrelevant when considering message content secrecy.} Goldberg presents a strict improvement over the previous result for this type of PIR protocol (except, as he notes, when $(t,k) = (1,4)$, where it is the same) by presenting a $t$-private $v$-Byzantine-robust $k$-out-of-$\ell$ PIR protocol for any $0 < t < k$ and $v < k - \lfloor  \sqrt{kt} \rfloor$~\cite{goldberg-2007}.
-
-One advantage to using a $t$-private $k$-out-of-$\ell$ PIR scheme over using PynGP (itself an \textbf{information-theoretic ($\ell-1$)-private $\ell$-server PIR} protocol) is that the user need not know which databases are online at the time she makes her request. Indeed, $t$-private $k$-out-of-$\ell$ PIR schemes handle the situation where some servers $n$ (where $n < k$) crash while the request is being made, such that the user is still able to obtain the results to her query in an expedient fashion.\footnote{From a usability standpoint, this is superior to the existing PynGP, where the user is not permitted to re-request his mail, and encouraged to be patient -- undelivered mail will remain in the system for later retrieval.}
-
-Unfortunately, $t$-private $k$-out-of-$\ell$ PIR protocols such as those based on Shamir secret sharing force a trade-off between the number of servers that may collude before the system is compromised, and the number of servers that may be Byzantine without affecting the user's ability to obtain the results of her query. (In a $t$-private $k$-out-of-$\ell$ PIR scheme, any $t+1$ may collude to break the security of the system, as long as those $k$ servers all received a valid query from the user. Thus, the maximum collusion protection such a system can offer is where $t$ = ($k-1$).) An ($\ell-1$)-private $\ell$-server PIR protocol is equivalent to a $t$-private $k$-out-of-$\ell$ PIR scheme, where $t$ = ($\ell-1$) and $k$ = $\ell$. Since it is the case that when availability guarantees (inversely proportional to $k$) are increased, the threshold at which \emph{any} query at \emph{any} time may be compromised by colluding servers is diminished in response, the decision must be made as to how critical for a given use case it is that the system respond with a correct answer in the face of sudden node failure. This permits a safe trade-off between availability of service and the security of the system. In the case of a service offering persistent pseudonyms, we consider intermittent and infrequent service interruptions to be the lesser concern.
-
-%Unfortunately, $t$-private $k$-out-of-$\ell$ PIR protocols such as those based on Shamir secret sharing force a trade-off between the number of servers that may collude before the system is compromised, and the number of servers that may be Byzantine without affecting the user's ability to obtain the results of her query. (In a $t$-private $k$-out-of-$\ell$ PIR scheme, any $t+1$ may collude to break the security of the system, as long as those $k$ servers all received a valid query from the user. Thus, the maximum collusion protection such a system can offer is where $t$ = $k-1$.) An ($\ell-1$)-private $\ell$-server PIR protocol is equivalent to a $t$-private $k$-out-of-$\ell$ PIR scheme, where $t$ = ($\ell-1$) and $k$ = $\ell$. Therefore, the decision must be made as to how critical for a given use case it is that the system respond with a correct answer in the face of sudden node failure, so that a safe trade-off may be made between availability of service and the security of the system. Since it is the case that when availability guarantees are increased, the threshold at which \emph{any} query at \emph{any} time may be compromised by colluding servers is diminished in response. In the case of a service offering persistent pseudonyms, we consider intermittent and infrequent service interruptions to be the lesser concern.
-
-
-The author of~\cite{goldberg-2007} addresses this serious concern by introducing an extension to his $t$-private $v$-Byzantine-robust $k$-out-of-$\ell$ PIR protocol that provides \emph{computational security} protection for the results of a query, even in the case that \emph{all servers} collude, by proposing a hybrid privacy protection scheme which relies on the \emph{Paillier additive homomorphic cryptosystem}~\cite{pailli99}. This extension gives a PIR protocol with $t$-private $v$-Byzantine-robust $k$-out-of-$\ell$ information-theoretic protection and $\ell$-computational protection. However, as the author states, adding this modification is quite expensive. While the rest of the protocol performs rather efficiently (and better than previous results, under certain configurations), the CPIR hybrid extension suffers the same problem as all other currently existing CPIR solutions. For each $n$-bit word in the database, the server must perform a modular exponentiation operation. Based on the performance calculations in~\cite{goldberg-2007}, we do not believe this satisfies the goals of \emph{deployability} and \emph{usability} as set forth in Section 1.0.1 of the Pynchon Gate design paper~\cite{pynchon}. The concept of a hybrid information-theoretic/computational PIR scheme is quite promising, should a CPIR scheme be proposed with affordable computation requirements. Until that happens, though, we are forced to discount the $\ell$-computational protection extension to the protocol in~\cite{goldberg-2007}, as it is presently impractical given real-world resource constraints. This leaves the protocol of~\cite{goldberg-2007} able to identify Byzantine servers at the cost of reduced overall security as compared to PynGP. A decision to deploy the protocol as suggested in~\cite{goldberg-2007} would require considering this overall weakening of the security model.\footnote{Changing the security model from ($\ell-1$)-private $\ell$-server PIR to $t$-private $k$-out-of-$\ell$ PIR substantially violates one of the original design goals of the Pynchon Gate. While it is now known that communication through a mix network is unlikely to be secure in the scenario where all but one of the nodes are compromised~\cite{e2e-traffic}, at the time that the Pynchon Gate design paper was written, this was optimistically considered to be true under some threat models. Thus, when the authors of the Pynchon Gate design paper stated ``we want the Pynchon Gate to resist active and passive attacks at least as well as the state of the art of forward message anonymity''~\cite{pynchon} , it can safely be assumed that they had an ($\ell-1$)-private $\ell$-server PIR protocol in mind as a partial means to achieving this goal. As the Pynchon Gate does not provide a mechanism for forward message delivery, however, one should consider that the overall system is only as strong as its constituent parts, and ($\ell-1$)-private $\ell$-server information-theoretic PIR is arguably substantially more secure than mix networks~\cite{chaum-mix,remailer-history,hal-remailer,mixmaster-spec,fsmix03,mixminion,mixencrypt,minx}, for reasons too numerous to elaborate in this work~\cite{back01,mixmaster-reliable,limits-open,wright02,hitting-set04}.}
-
 \section{A Novel Solution to the Byzantine Postman Problem}
 
-We propose a solution to the Byzantine Postman Problem which preserves the original threat model established for the original Pynchon Gate Protocol (henceforth referred to as PynGP 1.0). Described below, our revised PynGP (PynGP 2.0) protocol relies only on additional sets of operations already performed by PynGP 1.0, yet this modified version of PynGP 1.0 allows for the detection and identification of Byzantine nodes with sufficient probability that our denial of service attack against the PynGP 1.0 is no longer feasible in practice. Furthermore, while it increases the amount of communication bandwidth needed to perform PynGP operations, the communcation overhead is still within the realm of affordability for the target user and operator demographic stated in~\cite{pynchon}.
+We propose a solution to the Byzantine Postman Problem which preserves the same threat model established for the original Pynchon Gate Protocol (henceforth referred to as PynGP 1.0). Described below, our revised PynGP (PynGP 2.0) protocol relies only on additional sets of operations already performed by PynGP 1.0, yet this modified version of PynGP 1.0 allows for the detection and identification of Byzantine nodes with sufficient probability that our denial of service attack against the PynGP 1.0 is no longer feasible in practice. Furthermore, while it increases the amount of communication bandwidth needed to perform PynGP operations, the communcation overhead is still within the realm of affordability for the target user and operator demographic stated in~\cite{pynchon}.
 
 \subsection{PynGP 2.0}
 
@@ -135,21 +143,25 @@
 
 Thus, PynGP 2.0 is identical to PynGP 1.0 as described in Subsection \ref{subsec:PIR}, with the following modifications to the protocol:
 
-Each time the protocol runs, the client prepares two sets of bit vectors to send to the chosen distributors. The first set, $\alpha$, is used to obtain the private mailbox data via the PIR protocol; the second set, $\eta$, consists entirely of randomly generated bit vectors and is used to challenge the honesty of each distributor.
+Each time the protocol runs, the client prepares two sets of bit vectors to send to the chosen distributors. The first set, $\{\vec{\alpha}\}$, is used to obtain the private mailbox data via the PIR protocol; the second set, $\{\vec{\eta}\}$, is used to challenge the honesty of each distributor.
 
-At the step in the PynGP 1.0 protocol where the client would transmit the ``random-looking'' bit vector to each distributor, the client submits two ``random-looking'' bit vectors instead, one from $\{\alpha\}$ and one from $\{\eta\}$, transmitted in a random order. 
+At the step in the PynGP 1.0 protocol where the client would transmit the ``random-looking'' bit vector to each distributor, the client submits two ``random-looking'' bit vectors instead, one from $\{\vec{\alpha}\}$ and one from $\{\vec{\eta}\}$, transmitted in a random order. 
 
-Upon receiving these bit vectors, the distributor performs the operations as described in Subsection \ref{subsec:PIR}, and then returns two responses, in the same order which the requests were received (or otherwise in such a manner that the responses are linkable to the requests which generated them). The client caches the response for the $\{\eta\}$ request, then performs the PIR operation as previously described using the $\{\alpha\}$ results from all distributors.
+Upon receiving these bit vectors, the distributor performs the operations as described in Subsection \ref{subsec:PIR}, and then returns two responses, in the same order which the requests were received (or otherwise in such a manner that the responses are linkable to the requests which generated them). The client caches the response for the $\{\vec{\eta}\}$ request, then performs the PIR operation as previously described using the $\{\vec{\alpha}\}$ results from all distributors.
 
-We now introduce a new component in the Pynchon Gate architecture, known as the \emph{validator}. This component is essentially a distributor, except that it only exists to confirm that the other distributors are not Byzantine. This specialized distributor validates the ``cut-and-chosen'' responses as being correct, deterring the operation of Byzantine nodes and probabilistically uncovering them should they exist. The validator must be operated by the same entity who operates the collator, since the operator of the collator is already empowered to perform a denial of service attack by turning off the server running the collator. Ergo, the balance of power in this distributed trust system is maintained by placing the validator (whose operator could also force a denial of service attack, though not as easily) under the control of the same entity. Note that communication with the validator occurs over an encrypted link, just as with normal distributors. 
+We now introduce a new component in the Pynchon Gate architecture, known as the \emph{validator}. This component is essentially a distributor, except that it only exists to confirm that the other distributors are not Byzantine. This specialized distributor validates the ``cut-and-chosen'' responses as being correct, deterring the operation of Byzantine nodes and probabilistically uncovering them should they exist. 
 
-The responses from set $\{\eta\}$ (while possibly able to return a valid mailbox) are not of interest to the client for their contents. Instead, these responses are cached by the client, along with the corresponding request that was sent to the distributor to generate them as well as an identifier for the distributor which returned the responses. To verify that a distributor is not attempting to behave in a Byzantine manner, the same bit vectors in $\{\eta\}$ that had been submitted to each distributor can subsequently be submitted to the validator, and the validator should return a response identical to that which the original distributor returned for each request. (The entire $\{\eta\}$ needs to be submitted, as there may be multiple Byzantine servers acting simultaneously.) Should the validator return a response that differs from the one received by the client from a given distributor, that distributor should be suspected of being Byzantine.\footnote{Strictly speaking, the $\{\eta\}$ vectors should always be sent to the validator, regardless of the outcome of the PIR operation using the $\{\alpha\}$ results, to guard against the scenario where an additional adversary not in collusion with the Byzantine server might learn additional information about the state of the network. However, this level of paranoia may not be affordable until bandwidth and computation become less expensive.}
+To ensure that additional trust is not required of this new component to the system, the validator \textbf{must} be operated by the same entity who operates the collator. The operator of the collator is already empowered to perform a denial of service attack by simply unplugging the power cord of the server running the collator. Ergo, the balance of power in this distributed trust system is maintained by placing the validator (whose operator could also force a denial of service attack, though not as easily) under the control of the same entity.\footnote{The validator is never provided the contents of $\{\vec{\alpha}\}$ queries, since the validator, collator, and nym server are \emph{not} considered trusted with regard to a user's privacy, and knowledge of the contents of $\{\vec{\alpha}\}$ queries (or responses) could provide information about the user's identity.} Note that communication with the validator occurs over an encrypted link, just as with normal distributors.  
 
+The responses from set $\{\vec{\eta}\}$ (while possibly able to return a valid mailbox) are not of interest to the client for their contents. Instead, these responses are cached by the client, along with the corresponding request that was sent to the distributor to generate them as well as an identifier for the distributor which returned the responses. To verify that a distributor is not attempting to behave in a Byzantine manner, the same bit vectors in $\{\vec{\eta}\}$ that had been submitted to each distributor can subsequently be submitted to the validator, and the validator should return a response identical to that which the original distributor returned for each request. (The entire $\{\vec{\eta}\}$ needs to be submitted, as there may be multiple Byzantine servers acting simultaneously.) Should the validator return a response that differs from the one received by the client from a given distributor, that distributor should be suspected of being Byzantine.\footnote{Strictly speaking, the $\{\vec{\eta}\}$ vectors should always be sent to the validator, regardless of the outcome of the PIR operation using the $\{\vec{\alpha}\}$ results, to guard against the scenario where an additional adversary not in collusion with the Byzantine server might learn additional information about the state of the network. However, this level of paranoia may not be affordable until bandwidth and computation become less expensive.}
+
 %The client should avoid sending the entire set at once, however. Instead, each element should be submitted at a metered pace to avoid excessive load on the valiator.
-As proposed above, this protocol gives a Byzantine server a 50\% chance of being discovered each time it attempts to behave in a Byzantine manner. That threshold can be increased at the expense of greater bandwidth overhead; however, we feel that a 50\% detection rate is sufficient to deter this sort of attack, due to the inherent reputation system involved with the distributor network. (One Byzantine action by a distributor \emph{verifiable as such} to the collator, validator, or nym server operator should be sufficient to blacklist a Byzantine distributor. Care must be taken to ensure that an attacker does not frame an honest server as Byzantine to have it blacklisted. Multiple reports identifying a given server as Byzantine might simply indicate a Sybil attack being performed to achieve the blacklisting.)
+As proposed above, this protocol gives a Byzantine server a 50\% chance of being discovered each time it attempts to behave in a Byzantine manner. That threshold can be increased at the expense of greater bandwidth overhead; however, we feel that a 50\% detection rate is sufficient to deter this sort of attack, due to the inherent reputation system involved with the distributor network.\footnote{A Byzantine server's chances of successfully providing a Byzantine response of unidentified origin decreases in an manner inversely proportional to its probabilty of detection.} This probability of detection is based on the assumption that the Byzantine server considers it acceptable that its Byzantine action may be \emph{ineffective}. If the server wishes to \emph{guarantee} a successful Byzantine operation, it can do so by providing Byzantine answers to all the client's requests, but the probability of detection in that case is 100\%. 
 
+One Byzantine action by a distributor \emph{verifiable as such} to the collator, validator, or nym server operator should be sufficient to blacklist a Byzantine distributor. Care must be taken to ensure that an attacker does not frame an honest server as Byzantine to have it blacklisted. Multiple reports identifying a given server as Byzantine might simply indicate a Sybil attack being performed to achieve the blacklisting of an innocent server.
+
 \section{Remarks on Performance}
-As previously stated, PynGP 2.0 has a higher performance cost than its predecessor. We have attempted to strike a balance between security and excessive resource consumption when the security issues are unlikely to be problematic, or the resource requirements too great to qualify the protocol as ``deployable''. Areas where security could be increased, if performance cost were no object, include the addition of more than a single challenge-vector set to decrease the probability of a Byzantine server successfully avoiding detection.\footnote{For $n$ challenge-vector sets, the probability of avoiding detection is $\frac{1}{n+1}$.} Also, as previously noted, it would be ideal, were it affordable, for the challenge-vectors to be validated every time a PynGP protocol run was performed. This is likely cost-ineffective, though, given that knowledge of the user's failure to validate the hash root is unlikely to give an adversary any significant advantage, let alone lead to a user-level privacy vulnerability.
+As previously stated, PynGP 2.0 has a higher performance cost than its predecessor. We have attempted to strike a balance between security and excessive resource consumption when the security issues are unlikely to be problematic, or the resource requirements too great to qualify the protocol as ``deployable''. Areas where security could be increased, if performance cost were no object, include the addition of more than a single challenge-tor set to decrease the probability of a Byzantine server successfully avoiding detection.\footnote{For $n$ challenge-vector sets, the probability of avoiding detection is $\frac{1}{n+1}$, and consequently, the probability of failing to cause a Byzantine failure is $\frac{n}{n+1}$.} Also, as previously noted, it would be ideal, were it affordable, for the challenge-vectors to be validated every time a PynGP protocol run was performed. This is likely cost-ineffective, though, given that knowledge of the user's failure to validate the hash root is unlikely to give an adversary any significant advantage, let alone lead to a user-level privacy vulnerability.
 %This is likely too resource-intensive to be warranted given the low likelihood that knowledge of the user's failure to validate the hash root on the result returned by the system could lead to a user-level privacy vulnerability.
  It is far more important that all clients in the system behave according to the same policy in this regard. Also, if the number of challenge-vector sets is increased, the cost of validation increases proportionally.
 
@@ -157,28 +169,43 @@
 
 The ability to gracefully address the issue of Byzantine nodes is itself an anti-Byzantine measure; it has been suggested that when the Pynchon Gate is first deployed, it use PynGP 1.0 until evidence of the existence of Byzantine servers in the system is observed~\cite{bram-chat}. It is possible that merely having PynGP 2.0 implemented in the software and able to be enabled instantly could serve as a deterrent to a casual attacker whose goal is to deny service to the system, as the only effect such an attacker would have by deploying a Byzantine node would be to increase the network communication and verifier computation costs to a level that we have already deemed acceptable. 
 
-%\section{Attacking PIR-based Pseudonym Services via Tagging Attacks and User Oracles}
-%\subsection{Attacking Beimel-Stahl}
-%\subsection{Attacking Goldberg}
-%FIXME challenge-vector vs. challenge vector; cite bram-chat; figs.
+\section{A Prior Attempt to Improve the Robustness of the Pynchon Gate}
 
+In his recent paper~\cite{goldberg-2007}, Goldberg suggests that detection of Byzantine servers in the Pynchon Gate should be addressed using an \textbf{information-theoretic $t$-private $v$-Byzantine-robust $k$-out-of-$\ell$ PIR} protocol based on Shamir secret sharing~\cite{shamir-secret}, such as that proposed by Beimel and Stahl~\cite{beimel-robust}. The paper then presents a performance improvement upon the results of~\cite{beimel-robust}, and introduces a two-stage Byzantine recovery procedure for its protocol.
+
+\subsection{Usability Advantages in $t$-private $k$-out-of-$\ell$ PIR}
+
+One advantage to using a $t$-private $k$-out-of-$\ell$ PIR scheme over an ($\ell-1$)-private $\ell$-server PIR protocol such as PynGP is that the user need not know which databases are online at the time she makes her request. Indeed, $t$-private $k$-out-of-$\ell$ PIR schemes handle the situation where some servers $n$ (where $n < k$) crash while the request is being made, such that the user is still able to obtain the results to her query in an expedient fashion. From a usability standpoint, this is superior to the existing PynGP, where the user is not permitted to re-request his mail, and encouraged to be patient -- undelivered mail will remain in the system for later retrieval.\footnote{The security implications of permitting users to recover from Byzantine failures (and obtain their data, even when some number of responding distributors are known to be Byzantine) are not well studied. It is our intuition that an attacker, particularly one with access to at least two colluding distributors, could manipulate his responses such that some users were able to pass the recovery step, and others unable to, and in this manner the attacker could use the user as an oracle and gain information based on his actions. Therefore, PIR schemes which provide Byzantine \emph{recovery} properties (such as~\cite{beimel-robust} and~\cite{goldberg-2007}) should be used with caution if intended to serve as a building-block for an anonymity system.)}
+
+\subsection{Security Degradation with $t$-private $k$-out-of-$\ell$ PIR}
+
+Unfortunately, $t$-private $k$-out-of-$\ell$ PIR protocols such as those based on Shamir secret sharing force a trade-off between the number of servers that may collude before the system is compromised, and the number of servers that may be Byzantine without affecting the user's ability to obtain the results of her query. (In a $t$-private $k$-out-of-$\ell$ PIR scheme, any $t+1$ may collude to break the security of the system, as long as those $k$ servers all received a valid query from the user. Thus, the maximum collusion protection such a system can offer is where $t$ = ($k-1$).) An ($\ell-1$)-private $\ell$-server PIR protocol is equivalent to a $t$-private $k$-out-of-$\ell$ PIR scheme, where $t$ = ($\ell-1$) and $k$ = $\ell$. Since it is the case that when availability guarantees (inversely proportional to $k$) are increased, the threshold at which \emph{any} query at \emph{any} time may be compromised by colluding servers is diminished in response, the decision must be made as to how critical for a given use case it is that the system respond with a correct answer in the face of sudden node failure. This permits a safe trade-off between availability of service and the security of the system. In the case of a service offering persistent pseudonyms, we consider intermittent and infrequent service interruptions to be the lesser concern.
+
+\subsection{An Infeasible Attempt to Rectify this Security Degradation}
+
+Goldberg addresses this serious concern by introducing an extension to his $t$-private $v$-Byzantine-robust $k$-out-of-$\ell$ PIR protocol that provides \emph{computational security} protection for the results of a query, even in the case that \emph{all servers} collude, by proposing a hybrid privacy protection scheme which relies on the \emph{Paillier additive homomorphic cryptosystem}~\cite{pailli99}. This extension gives a PIR protocol with $t$-private $v$-Byzantine-robust $k$-out-of-$\ell$ information-theoretic protection and $\ell$-computational protection. However, as the author states, adding this modification is quite expensive. While the rest of the protocol performs rather efficiently (and better than previous results, under certain configurations), the CPIR hybrid extension suffers the same problem as all other currently existing CPIR solutions. For each $n$-bit word in the database, the server must perform a modular exponentiation operation. Based on the performance calculations in~\cite{goldberg-2007}, we do not believe this satisfies the goals of \emph{deployability} and \emph{usability} as set forth in Section 1.0.1 of the Pynchon Gate design paper~\cite{pynchon}. The concept of a hybrid information-theoretic/computational PIR scheme is quite promising, should a CPIR scheme be proposed with affordable computation requirements. Until that happens, though, we are forced to discount the $\ell$-computational protection extension to the protocol in~\cite{goldberg-2007}, as it is presently impractical given real-world resource constraints. 
+
+These real-world limitations reduce Goldberg's protocol to a simple $t$-private $v$-Byzantine-robust $k$-out-of-$\ell$ PIR protocol such as that proposed in~\cite{beimel-robust}, which allows for the identification of Byzantine servers only at the cost of reduced overall security as compared to PynGP. PynGP 2.0 requires no such compromise in its security.
+
+\subsection{Partitioning attacks}
+
+In contrast to the Beimel and Stahl protocol, the Goldberg protocol presents a two-stage Byzantine recovery procedure. An attacker controlling two or more nodes can manipulate the responses given to certain users such that some users will be able to recover from the Byzantine action using the first-stage recovery operation and other users will require the second stage recovery operation. This offers the possibility that an attacker may partition users based on their response to Byzantine actions, thus presenting a serious threat to user anonymity. Detecting which recovery step the user conducted is not trivial, but using the temperature-induced clock-skew caused by the recovery operations as an indicator of which recovery operation is being performed, an attacker can learn into which set a given host falls. (This approach is the same as that used by Murdoch to detect remote host activity in~\cite{HotOrNot}). 
+
 \section{Conclusions}
-We have reviewed the attack we first described in~\cite{cosic-2007-001}, and have show that it has significant impact on the deployability and potential success of the Pynchon Gate, as well as other PIR-based nym server systems which do not account for Byzantine servers. A denial or degradation of service attack would be nearly impossible to thwart, and would likely happen soon after the system became popular among users. This vulnerability must not be present in the public system if it is to be expected to achieve and maintain any level of popularity or substantial user-base.
+We have reviewed the attack as described in~\cite{cosic-2007-001}, and found that it has significant impact on the deployability and potential success of the Pynchon Gate, as well as other PIR-based nym server systems that do not account for Byzantine servers. A denial or degradation of service attack would be nearly impossible to thwart, and would likely happen soon after the system became popular among users. This vulnerability must not be present in the public system if it is to be expected to achieve and maintain any level of popularity or substantial user-base.
 
-We have examined the solution proposed by Goldberg in~\cite{goldberg-2007} to address this vulnerability. We show that the trade-offs made in Goldberg's proposal do not satisfy the the security requirements set forth for the Pynchon Gate in its original design paper, as Goldberg's core protocol weakens the security assumptions significantly compared to the original PynGP. We show, using the performance metrics provided in~\cite{goldberg-2007}, that the proposed extension offered by the author to sufficiently strengthen the protocol to an adequate security level, while a groundbreaking idea in the theory of PIR, fails the engineering requirements for the Pynchon Gate: namely, it is too costly for volunteer operators to be expected to operate, and thus in practice, the security of the Pynchon Gate would rely on the weaker core protocol presented in~\cite{goldberg-2007}, without the addition of the stronger computational PIR-based protocol extension.
+We have presented a subtle modification to PynGP 1.0, relying on nothing more than an additional set of operations already performed by the original PynGP 1.0, to enable the detection and identification of Byzantine nodes with sufficient probability that the denial of service attack against the PynGP 1.0 is no longer feasible. This modified protocol, PynGP 2.0, requires no weakening of the original Pynchon Gate security model, and although it increases the bandwidth communication overhead, the bandwidth costs are still reasonable enough to fall within the engineering requirements of the original Pynchon Gate design goals: namely, that the system's bandwidth requirements be inexpensive enough to be reasonable for both users and system operators.
 
-We then show that we can modify the original PynGP 1.0, relying on nothing more than an additional set of operations already performed by the original PynGP 1.0, and that this new version of PynGP (PynGP 2.0) allows for the detection and identification of Byzantine nodes with sufficient probability that the denial of service attack against the PynGP 1.0 is no longer feasible. PynGP 2.0 requires no weakening of the original security model, and while it increases the bandwidth communication overhead, the bandwidth costs are still reasonable enough to fall within the engineering requirements of the original Pynchon Gate design goals: namely, that the system's bandwidth requirements be inexpensive enough to be reasonable for both users and system operators.
+We have examined the prior solution proposed by Goldberg in~\cite{goldberg-2007} to address the Byzantine server vulnerability. We show that the trade-offs made in Goldberg's proposal do not satisfy the the security requirements set forth for the Pynchon Gate in its original design paper, as Goldberg's core protocol weakens the security assumptions significantly compared to the original PynGP, and introduces new avenues of attack.
 
 With the addition of PynGP 2.0, we consider the Pynchon Gate design to be superior to any other high-latency pseudonym service offering strong privacy properties currently proposed in the literature.
 
 \section{Future Work}
 
-The ``cut-and-choose'' Byzantine protection in PynGP 2.0 provided by the set of challenge bit-vectors $\{\eta\}$ is possibly not as efficient as it could be. The security implications of a modified validator protocol should be explored, perhaps with inspiration from sister disciplines, such as digital cash authenticity verification~\cite{ChaumFiNa88}, secure secrecy-preserving auctions~\cite{parkes-auction}, or randomized partial checking in mix networks for voting applications~\cite{randomized-checking}.
+The ``cut-and-choose'' Byzantine protection in PynGP 2.0 provided by the set of challenge bit-vectors $\{\vec{\eta}\}$ is possibly not as efficient as it could be. The security implications of a modified validator protocol should be explored, perhaps with inspiration from sister disciplines, such as digital cash authenticity verification~\cite{ChaumFiNa88}, secure secrecy-preserving auctions~\cite{parkes-auction}, or randomized partial checking in mix networks for voting applications~\cite{randomized-checking}.
 
 Using the PynGP 2.0 protocol, users can detect when a given server has behaved in a Byzantine fashion. However, they have no irrefutable way of proving this to a third party. We have considered a scenario where the distributors always return a signed response to a PIR query, but there is no reason why a Byzantine server would not simply refuse to sign (or more likely, sign incorrectly) data it wished to later repudiate. Until an adequate solution which enables the user to prove a given flawed response came from a particular server is available, distributor reputation is likely to be dictated by consensus.\footnote{However, it may be a positive property that signing is not used in this way, as this permits a wider degree of repudiation on the part of the user with regard to her received messages.}
 
-The security implications of permitting users to recover from Byzantine failures (and obtain their data, even when some number of responding distributors are known to be Byzantine) are not well studied. It is our intuition that an attacker, particularly one with access to at least two colluding distributors, could manipulate his responses such that some users were able to pass the recovery step, and others unable to, and in this manner the attacker could use the user as an oracle and gain information based on his actions.
-
 The prototype Pynchon Gate system needs to be deployed and performance tested, and user-studies must be performed to evaluate its success from a deployability and usability standpoint.
 
 % The resource requirements for the new architectural component validator need to be more fully investigated, and won't truly be known until the system is tested in a live environment with actual Byzantine nodes. It is conceivable that the validator might actually be multiple validators in a 
@@ -188,10 +215,11 @@
 
 We would like to thank Brian Warner and Bryce Wilcox-O'Hearn for their helpful early discussions of the Byzantine Postman Problem; Ian Goldberg for suggesting the potential of polynomial interpolation-based PIR solutions; Orr Dunkelman, Meredith L. Patterson, and Julia Wolf for informative discussions of $t$-private $k$-out-of-$\ell$ PIR schemes; Elena Andreeva, Ian Goldberg and David Molnar for assistance in searching the literature for pertinent work; and David Chaum, Bram Cohen and Meredith L. Patterson for inspirational discussion of PynGP 2.0. 
 
-We would also like to especially thank Nikita Borisov, David Chaum, Shar van Egmond, and Meredith L. Patterson for their advice and support during the course of this research. 
-
+We would also like to especially thank Nikita Borisov, David Chaum, Shar van Egmond, Meredith L. Patterson, and Bram Cohen for their advice and support during the course of this research. 
+ 
 This work was supported in part by the Concerted Research Action (GOA) Ambiorics 2005/11 of the Flemish Government and by the IAP Programme P6/26 BCRYPT of the Belgian State (Belgian Science Policy). Len Sassaman's work was supported in part by the EU within the PRIME Project under contract IST-2002-507591. 
 
+
 \bibliographystyle{plain}
 \bibliography{byzantine-fix}
 \end{document}
\ No newline at end of file

***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/