[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [or-cvs] r19162: {projects} start making a 2009 todo list out of the performance ideas. (projects/performance)



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey Roger,

your performance metrics TODOs raise a couple of questions. Maybe you
can help resolve some of them?

>   - 1.2, new circuit window sizes
>     - Conclude whether the transition period will hurt as much as it
>       seems like it will.
>     * Pick a lower number, and switch to it.
> Metrics: gather queue sizes from relays so we have a better sense of
> what's actually going on.

Can you tell more details? Are there queues for circuits (1000 cells
max) and for streams (500 cells max)? Is relay.c a good place to start
looking for that code? Are we interested in average numbers of cells by
streams, by circuits, by both, or only in the sum of all cells waiting?
Should we write the number of cells waiting in those queues to disk
every 1 or 10 seconds and use those data for evaluation?

>   - 2.5, Default exit policies
>     * Change Vidalia's default exit policy to not click "other protocols".
>     D let exit relays specify some destination networks/ports that get
>       rate limited further.
> Metrics: At what fraction of exit relays allowing a given port out
> do connections to that port start to suffer? That is, if even 5%
> of the relays (by bandwidth) allowing a port to exit are enough for
> most connections to that port to work fine, then we're going to have
> a tough time pushing unwanted traffic off the network just by changing
> some exit policies. (Alas, this question is messy because it pretends
> that the amount of traffic generated for port x is independent of x.
> How to phrase it so it's more useful?)

Okay, I'm not sure if I understand that question. What exactly do you
want to have measured here?

>   - 3.6, incentives to relay
>     - Sort out how to do circuit priority in practice. I think the only
>       answer here is to make different TLS connections for different
>       priorities. (Otherwise other people can free-ride on your
>       high-priority conns.)
> Metrics: what period of time should the gold star status last? That is,
> What period of time, taken as a rolling snapshot of which relays are
> present in the network, guarantees a sufficiently large anonymity set
> for high-priority relays?

This goes in the direction of the churn measurements for my thesis. But
I'm unsure what exactly the question is. You want to know how many of
the relays at time X are still running at time Y? Or, maybe only a
subset of relays (which criteria)?

>   - 4.2, getting better bandwidth estimates
> Metrics: how accurate are the ten-second-bandwidth-burst advertised
> numbers anyway, in terms of guessing capacity? Steven says we're at 50%
> load, but is that just because our advertised bandwidth is a function
> of our recent load?

How would accuracy be measured? How do I learn how much 100% of the
capacity of a relay are?

>     - What is "true" capacity anyway?
> Metrics: What other algorithms can we use to produce a more accurate
> advertised bandwidth?

Is this a question that can be answered by metrics? Can you give more hints?

>   - 4.5, Older entry guards are overloaded
> Metrics: compare "how fast each relay should be based on its advertised
> capacity" with "how long the relay has had the guard flag", to see how
> big an issue this is really.

Okay, that means finding a possible correlation between advertised
capacity and time since getting the Guard flag for the first time.
Should be possible, but I need to think harder about doing this
efficiently with the current database.

> Metrics: How many relays does a client touch over time x, given that they
> drop old guards y seconds after choosing them? Even if y is infinite,
> we have some number based on guards going away. How does x grow as we
> reduce y?

Hey, x has two meanings here. ;) The question should be "How many relays
z does a client touch over time x, given that they drop old guards y
seconds after choosing them?" Maybe we want to fix the time x to, say, 1
month?

>     * Pick a conservative y like six months, and implement.
>     D Reduce y based on the results of the metrics. (don't partition
>       clients too far by tor version though.)
> Metrics: if we were more flexible in our Guard stability criteria, how
> many more relays would get the Guard flag? How would that influence the
> above numbers? I'd like to become very flexible so more than half of
> the relays get to be guards. Are there cutoffs that are reasonable and
> fit naturally into the data from the past few years?

I have started playing with the selection criteria to see how many
Fast/Stable/Guard nodes we'd have ended up with in the past. If the
stupid database finishes anytime soon, we'll have results tonight or
tomorrow. The question how that would have influenced the above numbers
would be next.

> Metrics: if we're more flexible in our Guard speed criteria, how does
> that impact the speed that clients should expect? Originally we avoided
> 20KB/s relays as guards, because "then clients can't ever get more than
> 20KB/s". But they can't get that now anyway.

What metric would answer this question? The distribution of advertised
bandwidth for different Guard selection criteria?

>   - 5.2, better timeouts for giving up on circuits/streams
>     * clients gather data about circuit timeouts, and then abandon
>       circuits that take more than a std dev above that.
> Metrics: Right now we abandon the circuit after 10 seconds for the first
> try. What are the download stats if we don't abandon it? What "try a
> new one" timeouts will minimize the number of circuits we go through,
> while also minimizing the time-until-user-gets-website?

Hmm. We might completely remove the 10-seconds timeout and see how long
it takes until the stream is attached (or if it fails). From these data
we could derive a better timeout. Is that what you have conceived?

Thanks!
- --Karsten
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ39w50M+WPffBEmURAsuIAJ9sYzNlh/l9vFmwsWQUcTGhkS2P6gCggoQg
zFpC5BQe1ObQnK9QqiUFOWE=
=hb7K
-----END PGP SIGNATURE-----