[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Proposal 288: Privacy-Preserving Statistics with Privcount in Tor (Shamir version)



Hi Tim,

Prio allows a client to prove an arbitrary statement about a secret-shared input (expressed as as a circuit on that input) to a set of servers. This could provide some robustness against malicious inputs. However, it is not as much protection as we might really need because the circuit is only over one input, which prevents us from making relative comparisons across inputs. For example, Prio can provide a guarantee that an input value is at most some max value, but it can’t guarantee that a value isn’t over 1.5 times the max value of the *other* inputs. Identifying “outliers” relative to other inputs is important if you don’t know what the inputs are supposed to look like, which in Tor would be the case when “normal" inputs change due to underlying changes in network and user behavior.

Also, in Prio, servers use a generic secure multi-party computation (MPC) protocol to compute the circuits. If Tor is going to do that, why not just run a generic MPC protocol over all of the inputs? Doing so would allow Tor statistics aggregations to be robust to inputs that are likely “incorrect” given the values of the other inputs (see “robust statistics” for a wide variety of useful such computations, including for example median, trimmed mean, least trimmed squares, maximum likelihood estimation). Applying MPC over all inputs would only require implementing the “offline” phase of the computation (e.g. producing the “multiplication triples”, which are supplied by the client in Prio). There are reasonably efficient protocols for doing so, including SDPZ and TinyOT [1].

Best,
Aaron

[0] Ivan Damgard and Valerio Pastro and Nigel P. Smart and Sarah Zakarias, "Multiparty Computation from Somewhat Homomorphic Encryption", CRYPTO 2012, <http://eprint.iacr.org/2011/535>.

[1] Enrique Larraia and Emmanuela Orsini and Nigel P. Smart, "Dishonest majority multi-party computation for binary circuits”, CRYPTO 2014, <https://eprint.iacr.org/2015/472>.

On Dec 12, 2017, at 9:04 PM, teor <teor2345@xxxxxxxxx> wrote:


On 2 Dec 2017, at 07:26, Nick Mathewson <nickm@xxxxxxxxxxxxxx> wrote:

Filename: 288-privcount-with-shamir.txt Title: Privacy-Preserving Statistics
with Privcount in Tor (Shamir version) Author: Nick Mathewson, Tim
Wilson-Brown, Aaron Johnson Created: 1-Dec-2017 Supercedes: 280 Status: Draft

0. Acknowledgments

Tariq Elahi, George Danezis, and Ian Goldberg designed and implemented the
PrivEx blinding scheme. Rob Jansen and Aaron Johnson extended PrivEx's
differential privacy guarantees to multiple counters in PrivCount:

https://github.com/privcount/privcount/blob/master/README.markdown#research-background

Rob Jansen and Tim Wilson-Brown wrote the majority of the experimental
PrivCount code, based on the PrivEx secret-sharing variant. This
implementation includes contributions from the PrivEx authors, and others:

https://github.com/privcount/privcount/blob/master/CONTRIBUTORS.markdown

This research was supported in part by NSF grants CNS-1111539, CNS-1314637,
CNS-1526306, CNS-1619454, and CNS-1640548.

The use of a Shamir secret-sharing-based approach is due to a suggestion by
Aaron Johnson (iirc); Carolin Zöbelein did some helpful analysis here.

Aaron Johnson and Tim Wilson-Brown made improvements to the draft proposal.

1. Introduction and scope

PrivCount is a privacy-preserving way to collect aggregate statistics about
the Tor network without exposing the statistics from any single Tor relay.

This document describes the behavior of the in-Tor portion of the PrivCount
system.  It DOES NOT describe the counter configurations, or any other
parts of the system. (These will be covered in separate proposals.)

2. PrivCount overview

Here follows an oversimplified summary of PrivCount, with enough
information to explain the Tor side of things.  The actual operation of the
non-Tor components is trickier than described below.

In PrivCount, a Data Collector (DC, in this case a Tor relay) shares
numeric data with N different Tally Reporters (TRs). (A Tally Reporter
performs the summing and unblinding roles of the Tally Server and Share
Keeper from experimental PrivCount.)

All N Tally Reporters together can reconstruct the original data, but no
(N-1)-sized subset of the Tally Reporters can learn anything about the
data.

(In reality, the Tally Reporters don't reconstruct the original data at
all! Instead, they will reconstruct a _sum_ of the original data across all
participating relays.)

In brief, the system works as follow:

To share data, for each counter value V to be shared, the Data Collector
first adds Gaussian noise to V in order to produce V', uses (K,N) Shamir
secret-sharing to generate N shares of V' (K<=N, K being the reconstruction
threshold), encrypts each share to a different Tally Reporter, and sends
each encrypted share to the Tally Reporter it is encrypted for.

The Tally Reporters then agree on the set S of Data Collectors that sent
data to all of them, and each Tally Reporter forms a share of the aggregate
value by decrypting the shares it received from the Data Collectors in S
and adding them together. The Tally Reporters then, collectively, perform
secret reconstruction, thereby learning the sum of all the different values
V'.

The use of Shamir secret sharing lets us survive up to N-K crashing TRs.
Waiting until the end to agree on a set S of surviving relays lets us
survive an arbitrary number of crashing DCs. In order to prevent bogus data
from corrupting the tally, the Tally Reporters can perform the aggregation
step multiple times, each time proceeding with a different subset of S and
taking the median of the resulting values.

Relay subsets should be chosen at random to avoid relays manipulating their
subset membership(s). If an shared random value is required, all relays
must submit their results, and then the next revealed shared random value
can be used to select relay subsets. (Tor's shared random value can be
calculated as soon as all commits have been revealed. So all relay results
must be received *before* any votes are cast in the reveal phase for that
shared random value.)

Below we describe the algorithm in more detail, and describe the data
format to use.

...

Chelsea and I had a conversation on another list about Prio.

Prio is similar to PrivCount, but would allow the Data Collectors to
prove that their responses are in range, using a zero-knowledge proof
technique.

I'm posting this here with her permission:

From: teor <teor2345@xxxxxxxxx>
Date: 13 December 2017 at 09:26:45 AEDT

On 13 Dec 2017, at 03:02, chelsea komlo <me@xxxxxxxxxxxxxxxx> wrote:

Hi All,

This came up in a slack channel I am a part of, and it seems relevant to
PrivCount work- I wanted to pass it along.

https://crypto.stanford.edu/prio/

"Prio is a privacy-preserving system for the collection of aggregate
statistics. Each Prio client holds a private data value (e.g., its
current location), and a small set of servers compute statistical
functions over the values of all clients (e.g., the most popular
location). As long as at least one server is honest, the Prio servers
learn nearly nothing about the clients’ private data, except what they
can infer from the aggregate statistics that the system computes"

I read the paper. I'm sure I didn't understand it all.
It's very new, but it shows promise.
And there's no real-world (even experimental) implementation yet.

Maybe we can architect PrivCount in Tor so it's possible to upgrade
to a system like this in future. In particular, we seem to be using
similar primitives to Prio, so that's useful.

Are there things we can change in the current proposal to make it easier
to upgrade to a system like Prio in future?

Perhaps it's enough to version all our data exchange formats.
I'm not sure if we should lock into one particular scheme, when
something better might come along.

T

--
Tim / teor

PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
ricochet:ekmygaiu4rzgsk6n
------------------------------------------------------------------------

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev