Re: [tor-dev] PrivCount and Prio IRC Meeting

Hello,

I won’t be able to be at this meeting, but I would like to make some comments:

1. Prio’s zero-knowledge proofs (i.e. SNIPs) are not secure against a single malicious server. If you are using them the decide whether or not to include a given input, then a malicious server can cause good inputs to be excluded or bad inputs to be included. This could be used to exclude all good inputs except for one target one or to repeatedly exclude-then-include the input of a target party over a sequence of meaurement periods to see how much it tends to affect the aggregate. The SNIP protocols can no doubt be upgraded to provide security against malicious servers, but as of yet no such protocol has been published, implemented, or evaluated.

2. A main application for using client-provided zero-knowledge proofs is to allow Boolean inputs to be added. A client's proof would guarantee that a given input is 0 or 1, despite the input being secret-shared using shares in a larger field (say, 32-bit values) and thus impossible to otherwise learn anything about its value. The server then could add up the inputs to determine how many clients had the Boolean flag set. This may well be useful for inputs from clients directly, which is the Mozilla case. In Tor’s case, there is no plan to have clients submit statistics themselves (e.g. from Tor Browser), because it raises obvious privacy/PR concerns (I believe these could be mitigated, but that discussion has yet to even seriously start as far as I can tell). In the Tor case, the inputs are coming from relays. To the extent that relays are reporting on client activity, the Boolean input case seems less useful, as the relays should really be reporting the total amount activity they saw instead of just if they saw something ever happen. I could imagine, however, that figuring out how many relays saw some weird event (like an error, or evidence of some attack) happen might be useful. Other than Boolean inputs, I’m not sure what we would want to be proved about the inputs. Of the examples in the Prio paper (Sec. 5.2), only frequency count and variance seem to use client proofs. Frequency count is the Boolean case I discussed. I’m not sure what would justify gathering the variance of the per-relay values.

3. PrivCount is compatible with Prio’s Affine Function Encodings, as such encodings compute aggregates simply by adding inputs.

My overall opinion about Prio is that could be very useful to collect per-client statistics, such as from Tor Browser, but that doing so would require an upgraded version secure against malicious servers.

Best,

Aaron

On Nov 19, 2018, at 7:19 PM, teor <teor@xxxxxxxxxx> wrote:

Hi all,

We are meeting to discuss PrivCount and Prio at 2200 UTC on
Tuesday 20 November in #tor-meeting on irc.oftc.net.

We will log the meeting, so that people who can't attend can catch
up later.

Here's some background:

Henry Corrigan-Gibbs recently built a private statistics system
called Prio <https://crypto.stanford.edu/prio/> that is now used for
privately collecting telemetry at Mozilla
<https://hacks.mozilla.org/2018/10/testing-privacy-preserving-telemetry-with-prio/>.
It provides a similar functionality to PrivCount
<https://ohmygodel.com/publications/privcount-ccs2016.pdf> that Tor is
planning to use, and also provides strong robustness against malformed or
malicious reports.

Some questions we'll discuss:

How can we design Tor's statistics to make it easy to:
* defend against corruption attacks, and
* support more complex aggregate statistics.

How does PrivCount in Tor's design handle aggregation
server failures?

Some background:

Here's my quick comparison of Prio and PrivCount in Tor:
* Prio servers can do complex calculations using linear data structures
* PrivCount is limited to additive totals (and histograms)

* Prio servers can defend against corruption attacks using SNIPs
(secret non-interactive proofs)
* PrivCount in Tor has an optional scheme to defend against corruption,
but it requires adding additional noise

* Prio doesn't have differential privacy (yet)
* PrivCount guarantees differential privacy across the entire set of
statistics

* Prio increases security by failing when one server fails
* PrivCount in Tor is robust to server failure, and compensates
for the decreased security by adding more noise
(The PrivCount design used for our research papers was not
robust, and failed whenever any server or client failed.)

Here are our latest specs, notes, and code for PrivCount in Tor:
https://gitweb.torproject.org/torspec.git/tree/proposals/288-privcount-with-shamir.txt
https://trac.torproject.org/projects/tor/wiki/org/meetings/2018MexicoCity/Notes/PrivCount
https://trac.torproject.org/projects/tor/wiki/org/meetings/2018MexicoCity/Notes/PrivCountTechnical
https://github.com/nmathewson/privcount_shamir

T

--
teor
----------------------------------------------------------------------

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev