[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-dev] Load Balancing in 2.7 series - incompatible with OnionBalance ?

To: "tor-dev@xxxxxxxxxxxxxxxxxxxx" <tor-dev@xxxxxxxxxxxxxxxxxxxx>
Subject: [tor-dev] Load Balancing in 2.7 series - incompatible with OnionBalance ?
From: Alec Muffett <alecm@xxxxxx>
Date: Tue, 20 Oct 2015 23:18:59 +0000
Accept-language: en-GB, en-US
Cc: Tim Wilson-Brown <twilsonb@xxxxxxx>
Delivered-to: archiver@xxxxxxxx
Delivery-date: Tue, 20 Oct 2015 19:19:23 -0400
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : cc : subject : date : message-id : content-type : mime-version; s=facebook; bh=GBe5oAhffatfXE9R4o1JLCAdf9xuw5NfVjOcsinfVFA=; b=C8b7SucnRSjl/6wG5EYw11Z22x+L8tupGZ11Lm4s1m8KawLxZG4huGvwmZI9uJMLd8kp iyJgCDkKP+9CcD+Ix/EdHqmMhvZTngkTrJxJ7Clq9krn1kAMCPz8288OTu3niQB2sqdv 93kEu47fm9wngh/vdZGQsLWCJI40pJ3DQyA=
List-archive: <http://lists.torproject.org/pipermail/tor-dev/>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-dev" <tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx>
Thread-index: AQHRC42z9JUo4GUVMEGeqiOtJAqoUg==
Thread-topic: Load Balancing in 2.7 series - incompatible with OnionBalance ?

So Iâve just had a conversation with dgoulet on IRC, which I will reformat and subedit here as a conversation regarding OnionBalance and issues in 2.6 and 2.7 when a recently rebooted HS publishes a fresh descriptor:

[â]

alecm: consider OnionBalance which - being a bunch of daemons on a bunch of servers - will be a lot more prone to intermittent failures of 1+ daemons yielding a lot of republishing

alecm: we tend to move services around, and daemons will be killed in one place and resurrected elsewhere, and then we'll have to bundle up a new descriptor and ship it out

dgoulet: hrm so with that new 027 cache behavior, as long as the IP are usable, the descriptor will be kept, if they all become unusable, a new descriptor fetch is triggered and then those IPs will be tried

alecm: There's a mandatory refresh [of the descriptor] after N minutes?

dgoulet: we'll retry 3 times and after that all HSDir are in timeout for 15 minutes (I think, I'll have to validate) before retrying any HSDirs

alecm: I wonder if descriptors should publish a recommended TTL - [number of seconds to live before refresh]

dgoulet: yeah we have an idea for a "revision-counter" in the descriptor being incremented at each new version for the 24 hours period

dgoulet: a TTL could be useful for load balancing though!

alecm: so, here's a scenario: imagine that we run 10 daemons,

alecm: call these daemons: A B C D E F G H I J - they all have random onion addresses

alecm: we steal one IP from each daemon, and bundle the 10 stolen IPs together to make an onionbalance site descriptor and publish it

alecm: people pull that descriptor, it's quite popular

alecm: we then lose power in a datacentre, which takes out half of our onions - say, A through E

alecm: we reboot the datacentre and restart A-E merely 10 minutes later

alecm: everyone who has already loaded our onionbalance site descriptor tests A B C D E and finds them all dead, because the old IPs for A-E are invalid

alecm: so they all move to F G H I J - which get overloaded even though (new) A B C D E are back up

alecm: and this persists for up to 244, even though the outage was only 10 minutes

alecm: net result: large chunks of the world (anyone with an old descriptor + anyone randomly choosing F-J) have a shitty experience, which is not what high-availability is all about :-)

dgoulet: that will be what's going to happen - having a TTL in the desc. would help here indeed, I see the issue

dgoulet: TTL would be one thing to add, here we could also add a mechanism for a client retrying IPs that failed in the situation where some of the IPs are still working, or making client balance themself randomly could be also an idea

dgoulet: definitely there is some content here for tor-dev - I don't have a good answer but it should definitely be addressed

alecm: proper random selection of IP would be beneficial for load-balancing; not perfect, but in the long run, helpful

â
Alec Muffett
Security Infrastructure
Facebook Engineering
London

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Follow-Ups:
- Re: [tor-dev] Load Balancing in 2.7 series - incompatible with OnionBalance ?
  - From: Alec Muffett
- Re: [tor-dev] Load Balancing in 2.7 series - incompatible with OnionBalance ?
  - From: Tom van der Woerdt

Prev by Author: Re: [tor-dev] ResearchEthics
Next by Author: Re: [tor-dev] Load Balancing in 2.7 series - incompatible with OnionBalance ?
Previous by thread: Re: [tor-dev] Feedback on CollecTor web redesign
Next by thread: Re: [tor-dev] Load Balancing in 2.7 series - incompatible with OnionBalance ?
Index(es):
- Author
- Thread