[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-relays] More consensus weight problems

To: Speak Freely <when2plus2is5@xxxxxxxxxx>
Subject: Re: [tor-relays] More consensus weight problems
From: Aaron Gibson <aagbsn@xxxxxxxx>
Date: Mon, 29 Jun 2015 18:46:44 +0000
Cc: tor-relays@xxxxxxxxxxxxxxxxxxxx
Delivered-to: archiver@xxxxxxxx
Delivery-date: Mon, 29 Jun 2015 14:47:17 -0400
In-reply-to: <5591859C.7010702@xxxxxxxxxx>
List-archive: <http://lists.torproject.org/pipermail/tor-relays/>
List-help: <mailto:tor-relays-request@lists.torproject.org?subject=help>
List-id: "support and questions about running Tor relays \(exit, non-exit, bridge\)" <tor-relays.lists.torproject.org>
List-post: <mailto:tor-relays@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays>, <mailto:tor-relays-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-relays>, <mailto:tor-relays-request@lists.torproject.org?subject=unsubscribe>
References: <5577179B.9090407@xxxxxxxxxx> <40aeee5953468a2963f3a511d84587ce@xxxxxxxxxxxx> <55845897.1010007@xxxxxxxxxx> <22ba408896ab0c658ff263617685c648@xxxxxxxx> <5591859C.7010702@xxxxxxxxxx>
Reply-to: tor-relays@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-relays" <tor-relays-bounces@xxxxxxxxxxxxxxxxxxxx>

On 2015-06-29 17:51, Speak Freely wrote:

Hello,
First of all, I love Tor. I love Tor Browser, and I love runningrelays.
When the problems are solved, I will most likely spin up more relays.
I'm leaving my fastest relay running, as a method of checking thestatusfor myself. The rest have already started to expire, and within thenext
week or so most of the other ones will have expired as well.

I'm going to try tor-dev-alpha 2.7.1 and change fingerprints, as per a
suggestion from s7r, seeing as how I have nothing to lose.

I just wish the bwauths could scan relays based off previous relative
consensus weights... If this particular relay was at 27000, it shouldbe
higher on the list to check compared to another one I have that is at
487. My one relay was blazing fast with thousands of connections, my

Well, relays are ranked by capacity, and split over several scannerprocesses - they do get measured against their relative peers. But itseems that, when they fall out of the measurement process ('Unmeasured')they must start again at the beginning. This is expected because allrelays start Unmeasured, and gradually increase their position in theconsensus (per relative capacity), in order to dampen sudden changes andlimit sybil attacks by requiring relays to stick around for a while,increasing the cost to an adversary. It likely should not be the casethat historically long running relays should start at the bottom if theyare unmeasured for a short period of time.

We are in the process of testing increasing the number of scanner andaccompanying tor instances from 4 to 9 (double, plus one for currentlyunmeasured relays) in order to decrease the amount of time each fractionof the network takes to measure and ensure that new relays or unmeasuredrelays are measured often. There are additional patches that introduceextra exits into the slice of relays, if there are no suitable exits tomeasure with. This likely won't address the above behavior, but we hopeit will reduce the number of relays that go missing. Currently we seemto have mixed results, with one Bandwidth Authority operator claimingminimal (50) unmeasured relays, and another claiming ~600 mixed relays.These numbers are not directly comparable because they were not sampledat the same time, and may not be representative of typical behavior -it's a little too soon to tell.

It's a bit tricky to both test these changes, on the live tor network,demonstrate that they produce sane results, and convince the directoryauthority operators and partner bandwidth authority operators to upgrade- nor do we want to do that all at once - gradual change is better. So,the goal is to produce results that will convince operators they shouldupdate, improve the situation for relay operators, and then startlooking at longer term solutions for the measurement problem that aremore maintainable and scalable in the long run.

other is painfully useless with dozens, but my fastest one lost its
consensus while the slowest one kept it's consensus. It just seems
silly. That being said, I don't know how/if the bwauths scan in any
order or just willy-nilly, (that's not entirely true, I know it's
segmented to some degree as I recall reading a blog post about how it's
chopped up) but... I'd be much less upset if my best relays worked and
my worst relays didn't. More complaining... bleh.

I hope to have a testable hypothesis as to why your faster relayssuffer(ed) more than the slower relays - it could be that the fractionof network by capacity allocated to a particular scanner is not wellbalanced, and that fraction is taking significantly longer to measure.In order to evaluate that statement I need to understand the commoncharacteristics of relays that become unmeasured/lose rank and see ifthey are from a similar segment of the network, and whether or not thatsegment of the network takes longer to measure than other segments.

Another hypothesis is that your relays are on the boundary between twosegments, and that a transition between scanner instances causes enoughmissed measurements to drop your relays. It would be helpful to knowwhat rank the last good measurement your or other relays had beforebecoming unmeasured.

It will require some cooperation with the existing deployed BandwidthAuthorities, in order to learn what their current scan times are - Iwill be writing some simple scripts to scrape these results so that wecan collect and publish some useful heuristics about the scannerprocesses to better try and debug this problem.


One thing I would like to point out though... it appears... These
problems have at least a casual relationship with MyFamily.

One group of MyFamily is completely done - all of them stuck at 20.
Another group of MyFamily is working happily.

I've been doing some tests over the past few months trying tounderstand

why I keep having problems, and one thing has consistently popped up...
MyFamily.

That is very interesting, because MyFamily should have nothing to dowith the scanner process at all - I'll need to think about this somemore.


As one of MyFamily lost consensus, another family gained consensus back
on or around the same time.

Yes, especially nusenu, I know I'm supposed to have it all configuredto

be under 1 MyFamily... But in a way I'm glad I didn't, as the casual
relationship I see really could only be seen having done what I did.

I say casual because I have no proof of causation. But... it is
interesting. If no one else has experienced similar problems, then I'd
chock it up to a completely unexpected unrelated set of mysterious
circumstances that should not have happened for which there is no
explanation.

Aaron, if there is anything I can do to help you please let me know.


If anything that I said above sparks a thought, please let me know :)



So in conclusion, I'm not done, I'm just not happy.

This was supposed to be a short email, oops.


Matt
Speak Freely


_______________________________________________
tor-relays mailing list
tor-relays@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

References:
- [tor-relays] More consensus weight problems
  - From: Speak Freely
- Re: [tor-relays] More consensus weight problems
  - From: Network Operations Center
- Re: [tor-relays] More consensus weight problems
  - From: Speak Freely
- Re: [tor-relays] More consensus weight problems
  - From: Aaron Gibson

Prev by Author: Re: [tor-relays] More consensus weight problems
Next by Author: [tor-relays] Messages
Previous by thread: Re: [tor-relays] More consensus weight problems
Next by thread: Re: [tor-relays] More consensus weight problems
Index(es):
- Author
- Thread