Hi Tim,
Prio allows a client to prove an arbitrary statement about a secret-shared input (expressed as as a circuit on that input) to a set of servers. This could provide some robustness against malicious inputs. However, it is not as much protection as we might really need because the circuit is only over one input, which prevents us from making relative comparisons across inputs. For example, Prio can provide a guarantee that an input value is at most some max value, but it can’t guarantee that a value isn’t over 1.5 times the max value of the *other* inputs. Identifying “outliers” relative to other inputs is important if you don’t know what the inputs are supposed to look like, which in Tor would be the case when “normal" inputs change due to underlying changes in network and user behavior.
Also, in Prio, servers use a generic secure multi-party computation (MPC) protocol to compute the circuits. If Tor is going to do that, why not just run a generic MPC protocol over all of the inputs? Doing so would allow Tor statistics aggregations to be robust to inputs that are likely “incorrect” given the values of the other inputs (see “robust statistics” for a wide variety of useful such computations, including for example median, trimmed mean, least trimmed squares, maximum likelihood estimation). Applying MPC over all inputs would only require implementing the “offline” phase of the computation (e.g. producing the “multiplication triples”, which are supplied by the client in Prio). There are reasonably efficient protocols for doing so, including SDPZ and TinyOT [1].
Best, Aaron
[0] Ivan Damgard and Valerio Pastro and Nigel P. Smart and Sarah Zakarias, "Multiparty Computation from Somewhat Homomorphic Encryption", CRYPTO 2012, < http://eprint.iacr.org/2011/535>.
[1] Enrique Larraia and Emmanuela Orsini and Nigel P. Smart, "Dishonest majority multi-party computation for binary circuits”, CRYPTO 2014, < https://eprint.iacr.org/2015/472>.
On 2 Dec 2017, at 07:26, Nick Mathewson <nickm@xxxxxxxxxxxxxx> wrote:
Filename: 288-privcount-with-shamir.txt Title: Privacy-Preserving Statistics with Privcount in Tor (Shamir version) Author: Nick Mathewson, Tim Wilson-Brown, Aaron Johnson Created: 1-Dec-2017 Supercedes: 280 Status: Draft
0. Acknowledgments
Tariq Elahi, George Danezis, and Ian Goldberg designed and implemented the PrivEx blinding scheme. Rob Jansen and Aaron Johnson extended PrivEx's differential privacy guarantees to multiple counters in PrivCount:
https://github.com/privcount/privcount/blob/master/README.markdown#research-background
Rob Jansen and Tim Wilson-Brown wrote the majority of the experimental PrivCount code, based on the PrivEx secret-sharing variant. This implementation includes contributions from the PrivEx authors, and others:
https://github.com/privcount/privcount/blob/master/CONTRIBUTORS.markdown
This research was supported in part by NSF grants CNS-1111539, CNS-1314637, CNS-1526306, CNS-1619454, and CNS-1640548.
The use of a Shamir secret-sharing-based approach is due to a suggestion by Aaron Johnson (iirc); Carolin Zöbelein did some helpful analysis here.
Aaron Johnson and Tim Wilson-Brown made improvements to the draft proposal.
1. Introduction and scope
PrivCount is a privacy-preserving way to collect aggregate statistics about the Tor network without exposing the statistics from any single Tor relay.
This document describes the behavior of the in-Tor portion of the PrivCount system. It DOES NOT describe the counter configurations, or any other parts of the system. (These will be covered in separate proposals.)
2. PrivCount overview
Here follows an oversimplified summary of PrivCount, with enough information to explain the Tor side of things. The actual operation of the non-Tor components is trickier than described below.
In PrivCount, a Data Collector (DC, in this case a Tor relay) shares numeric data with N different Tally Reporters (TRs). (A Tally Reporter performs the summing and unblinding roles of the Tally Server and Share Keeper from experimental PrivCount.)
All N Tally Reporters together can reconstruct the original data, but no (N-1)-sized subset of the Tally Reporters can learn anything about the data.
(In reality, the Tally Reporters don't reconstruct the original data at all! Instead, they will reconstruct a _sum_ of the original data across all participating relays.)
In brief, the system works as follow:
To share data, for each counter value V to be shared, the Data Collector first adds Gaussian noise to V in order to produce V', uses (K,N) Shamir secret-sharing to generate N shares of V' (K<=N, K being the reconstruction threshold), encrypts each share to a different Tally Reporter, and sends each encrypted share to the Tally Reporter it is encrypted for.
The Tally Reporters then agree on the set S of Data Collectors that sent data to all of them, and each Tally Reporter forms a share of the aggregate value by decrypting the shares it received from the Data Collectors in S and adding them together. The Tally Reporters then, collectively, perform secret reconstruction, thereby learning the sum of all the different values V'.
The use of Shamir secret sharing lets us survive up to N-K crashing TRs. Waiting until the end to agree on a set S of surviving relays lets us survive an arbitrary number of crashing DCs. In order to prevent bogus data from corrupting the tally, the Tally Reporters can perform the aggregation step multiple times, each time proceeding with a different subset of S and taking the median of the resulting values.
Relay subsets should be chosen at random to avoid relays manipulating their subset membership(s). If an shared random value is required, all relays must submit their results, and then the next revealed shared random value can be used to select relay subsets. (Tor's shared random value can be calculated as soon as all commits have been revealed. So all relay results must be received *before* any votes are cast in the reveal phase for that shared random value.)
Below we describe the algorithm in more detail, and describe the data format to use.
...
Chelsea and I had a conversation on another list about Prio.Prio is similar to PrivCount, but would allow the Data Collectors toprove that their responses are in range, using a zero-knowledge prooftechnique.I'm posting this here with her permission:From: teor <teor2345@xxxxxxxxx> Date: 13 December 2017 at 09:26:45 AEDT
On 13 Dec 2017, at 03:02, chelsea komlo <me@xxxxxxxxxxxxxxxx> wrote:
Hi All,
This came up in a slack channel I am a part of, and it seems relevant to PrivCount work- I wanted to pass it along.
https://crypto.stanford.edu/prio/
"Prio is a privacy-preserving system for the collection of aggregate statistics. Each Prio client holds a private data value (e.g., its current location), and a small set of servers compute statistical functions over the values of all clients (e.g., the most popular location). As long as at least one server is honest, the Prio servers learn nearly nothing about the clients’ private data, except what they can infer from the aggregate statistics that the system computes"
I read the paper. I'm sure I didn't understand it all. It's very new, but it shows promise. And there's no real-world (even experimental) implementation yet.
Maybe we can architect PrivCount in Tor so it's possible to upgrade to a system like this in future. In particular, we seem to be using similar primitives to Prio, so that's useful.
Are there things we can change in the current proposal to make it easierto upgrade to a system like Prio in future?Perhaps it's enough to version all our data exchange formats.I'm not sure if we should lock into one particular scheme, whensomething better might come along.T--Tim / teorPGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094Bricochet:ekmygaiu4rzgsk6n------------------------------------------------------------------------_______________________________________________tor-dev mailing listtor-dev@xxxxxxxxxxxxxxxxxxxxhttps://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
|