[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Update to Proposal 316: FlashFlow



Sorry I had other things to juggle after the meeting. Did not have time
to update pad. Adding some additional things we touched on below.

On 10/8/20 1:34 PM, Nick Mathewson wrote:
> 
> Hi!  We had a meeting about FlashFlow today, including several of the
> authors.  Here are the notes we wound up with for ideas and next
> straps.
> 
> Easy changes:
>     * Just use a PRNG; assume we can make them arbitrarily fast.
> (example candidates: chacha8, shake128.)
>     * Use relay identities as the identifiers for measurers, so that
> we won't need a novel authentication scheme.
>     * We can't call the list of measurer IDs a "network parameter",
> since technically speaking network parameters have to be integers.  It
> will have to be a different part of the consensus header.
>     * Make sure that all of the declared ranges for network parameters
> are as wide as they could possibly be; making these parameters take a
> wider range is hard to change later.
> 
> Trickier but straightforward:
>     * Describe how to avoid collisions with multiple coordinators
>        - idea: exactly how it's specified in the paper ;) but a
> simplier idea ...
>        - idea: coord 1 measurers on day 1, c2 on d2, ... etc. for all
> coords, then repeat
>     * Describe how to aggregate all background measurements over the
> full 30 seconds, and how to use that data.  (This may lower accuracy a
> little, but makes some kinds of the analysis harder.) Idea: relay
> reports *once* at end of measurement the total amount of bg traffic
> and the coord simply divides that by the length of the measurement to
> have a per-second average.
>     * Mention whether relays should reserve sockets in case they get measured
> 
> More thinking may be needed:
>     *  Summarize ideas for how multiple coordinators don't have to
> share full schedules with one another. Possibly divide up the network
> by days? [e.g., Coordinator 1 measures nodes in set X on Monday]
>     * Would it work if we declare a maximum measurement fraction (eg
> 75% of bandwidth) but measurers only use that fraction in a few
> measurements once in a while, and mostly they do less (eg 10% of
> bandwidth).
      * Find ways we may use sbws and/or network utilization to devcide
        on a safe flashflow measurement level that does not introduce
        traffic analysis side channels.
>     * Discuss migration: how do we use this data when not all relays
> support being measured in this way?

> 
> 
> In terms of implementation:
> - identify the python parts that are different to sbws, create sbws
> subpackages "ff measurer" and "ff coordinator" and add a config option
> to run in 1 mode or other, to do not have yet another code base to
> maintain
> 
> In terms of deployment:
> - we currently don't have any automatic way to ensure net is still
> "working properly", only some mostly-manual ways and some one-time
> experiments. This has caused some relay operators to do not be happy
> and some quite time to figure out the problem and solve it
> 
> In terms of coordination:
> We're deploying sbws only 1 dirauth at a time and trying to ensure net
> is still "working properly".
> If we deploy ff, before we have deployed sbws in all bwauths and
> ensure net is still "working properly", will be hard to see what is an
> sbws bug or ff one or both

In particular here, I want to run some additional live experiments to
ensure that long tail perf is improved:

https://gitlab.torproject.org/tpo/metrics/analysis/-/issues/33076#note_2569011
   aka
   https://trac.torproject.org/projects/tor/ticket/33076#comment:23

I do not yet have confidence that these issues are solved simply because
they did not appear in Shadow. Shadow does not simulate multi-instance
relays, CPU bound relays, or structural load imbalances in the network.

For this reason I think we should look for deployment strategies that
enable us to test a combination of Rob's experiment, Flashflow, and
sbws, in live testing (even if only for week-long periods of test).

This way we can confirm these bugs and develop fixes for these issues
before we wait for the whole network to upgrade to Flashflow, only to
realize it still has problems that make its tail worse than sbws.

> FlashFlow, the python code for coordinator, measurer, etc.
> https://gitlab.torproject.org/pastly/flashflow
> 
> The rendered documentation for/from the above https://flashflow.pastly.xyz/
> 
> Tor repo with branch https://gitlab.torproject.org/pastly/tor/-/tree/ff-v2
> 
> The ticket with the concerning graphs attributable to "Rob's speedtest thing"
> https://trac.torproject.org/projects/tor/ticket/33076


-- 
Mike Perry

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev