Re: [tor-relays] inet_csk_bind

Re: [tor-relays] inet_csk_bind_conflict

Subject: Re: [tor-relays] inet_csk_bind_conflict

From: Christopher Sheats <yawnbox@xxxxxxxxxxxxxxxx>

Date: Mon, 5 Dec 2022 20:48:55 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=emeraldonion.org; dmarc=pass action=none header.from=emeraldonion.org; dkim=pass header.d=emeraldonion.org; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+F9mWn2ZK7/nxzdA9RCVlHpltOG+PwDE62AH6CGa7d4=; b=XNFxI4P89GULf2Yx3ARdL1MGix0BGWmmmI7NwnOpu4ecBgdpFMXZBAVwZNU/W1e+QKkApg66dFz0FFjyuSzzIedOIhJLZP6qSsKCAQyq3jLbG6w+dKWEP7EVZrjKX5mRUlafTpL6vyw1YAuDm9V1iJvCCgA8WoI+77EjcnYZ1wtoEcmT7hXEz8vvRMPJH+0Fn1zmf5MCwfeqWRDkyZlc2tEop0vhIyawy/TWe1ktBFiGEBN1+ImZpUzecqxGeyX7TzG361psGY+h1KyKzBdkgeI/aYS08fwAnYePLWwdKvuyElYwZh5r81l88gY2EQ8b2294AIUOB2OQBfBhTK7ZNw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ArxZkyT/8MZ6hXnf58smpdstGh7KvNLof+XQuB+5rzQcsdMknYjVmDijbGDiCh0o3I6pUOPDdmHbmMxTU4VtmwrVrEFhMxBhz1flHmfcUtn5uJP2msBP/hlMjA0NLPD2OCE+QdGwzw5WgiskGutm+D1Q1Ja80fBz415yVkcvV9/XIVwlzfCpcCjThrA/Cyv71dDjeZ9hZ6XdNenbw4+20jze2c8O2R1QtmA0HSLjVIqCisjl8rBnMmVYF9qeTjxnCDeTEyaiDNFUsF+2VuGOeEdioUegYOW5ZSiPqx0tdYp8WpTcS0yQfbSuC/Zsq9P1CPDlwaFPBfYHNd/fG6xDhA==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=emeraldonion.org;

Cc: "tor-relays@xxxxxxxxxxxxxxxxxxxx" <tor-relays@xxxxxxxxxxxxxxxxxxxx>

Delivered-to: archiver@xxxxxxxx

Delivery-date: Tue, 06 Dec 2022 06:11:33 -0500

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=emeraldonionorg.onmicrosoft.com; s=selector2-emeraldonionorg-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+F9mWn2ZK7/nxzdA9RCVlHpltOG+PwDE62AH6CGa7d4=; b=0nZ4GGNxAXo+/Wqzn/+duX6lWE/RJNGwah2ADjp02xxxbVsMz2c+55Lxg9M3jSfsUHUihNuqvinNskHWkPKDU+awkH/fhYQqCX/G78x/5jkgktmdzo0TqyVe6P3cRcnJW6qykfOSwTF6oS9indWKBrkpW3BTZOBBLQYEmivJChA=

In-reply-to: <3887af31-cb9b-cb93-c22f-bf9c4bbc4154@wcbsecurity.com>

List-archive: <http://lists.torproject.org/pipermail/tor-relays/>

List-help: <mailto:tor-relays-request@lists.torproject.org?subject=help>

List-id: "support and questions about running Tor relays \(exit, non-exit, bridge\)" <tor-relays.lists.torproject.org>

List-post: <mailto:tor-relays@lists.torproject.org>

List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays>, <mailto:tor-relays-request@lists.torproject.org?subject=subscribe>

List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-relays>, <mailto:tor-relays-request@lists.torproject.org?subject=unsubscribe>

References: <B11BC7C2-89A1-47AC-B640-E601AD3E496F@emeraldonion.org> <92e81ff4-fef3-db57-0467-9f6eb4bc1e0b@wcbsecurity.com> <CD0F3641-2457-4884-B3A2-4906BFFCCE85@emeraldonion.org> <3887af31-cb9b-cb93-c22f-bf9c4bbc4154@wcbsecurity.com>

Reply-to: tor-relays@xxxxxxxxxxxxxxxxxxxx

Sender: "tor-relays" <tor-relays-bounces@xxxxxxxxxxxxxxxxxxxx>

Thread-index: AQHZBcR2ygactS1plku5VGzU/qixaK5aupEAgAHOtgCAAksWAIAA9giA

Thread-topic: [tor-relays] inet_csk_bind_conflict

May I ask what your set up is?
Are you running your relays on separate VMs on the main system or are
you using a different set up like having all IP addresses on the same OS
and using OutboundBindAddress , routing, etc... to separate them? If I
know more, I might be able to make a script specific to your set up.

Thank you. Yes, of course.

Ubuntu server 22.04 runs on bare metal. Ansible-relayor manages 20 exit relays on each system. Netplan has each IP individually listed (sub-divided as a /25 per server from within a dedicated /24, similarly for v6 addresses). I believe an available IP is randomly picked by ansible-relayor and used statically in each torrc file.

Here is an example torrc:

# ansible-relayor generated torrc configuration file

# Note: manual changes will be OVERWRITTEN on the next ansible-playbook run

OfflineMasterKey 1

RunAsDaemon 0

Log notice syslog

OutboundBindAddress 23.129.64.130

SocksPort 0

User _tor-23.129.64.130_443

DataDirectory /var/lib/tor-instances/23.129.64.130_443

ORPort 23.129.64.130:443

ORPort [2620:18c:0:192::130]:443

OutboundBindAddress [2620:18c:0:192::130]

DirPort 23.129.64.130:80

Address 23.129.64.130

SyslogIdentityTag 23.129.64.130_443

ControlSocket /var/run/tor-instances/23.129.64.130_443/control GroupWritable RelaxDirModeCheck

Nickname ageis

ContactInfo url:emeraldonion.org proof:uri-rsa ciissversion:2 tech@xxxxxxxxxxxxxxxx

Sandbox 1

NoExec 1

# we are an exit relay!

ExitRelay 1

IPv6Exit 1

DirPort [2620:18c:0:192::130]:80 NoAdvertise

DirPortFrontPage /etc/tor/instances/tor-exit-notice.html

ExitPolicy reject 23.129.64.128/25:*,reject6 [2613:18c:0:192::]/64:*,accept *:*,accept6 *:*

MyFamily <snip>

# end of torrc

Christopher Sheats (yawnbox)

Executive Director

Emerald Onion

Signal: +1 206.739.3390

Website: https://emeraldonion.org/

Mastodon: https://digitalcourage.social/@EmeraldOnion/

On Dec 4, 2022, at 10:08 PM, Chris <tor@xxxxxxxxxxxxxxx> wrote:

Sorry to hear it wasn't much help. Even though the additions I suggested
didn't help they certainly couldn't cause any harm and can't be
responsible for the drops in traffic.

As for the torutils scripts, I'm sure toralf would be able to better
investigate that but I have a feeling you have a certain set up that
might not have worked with the script. May I ask what your set up is?
Are you running your relays on separate VMs on the main system or are
you using a different set up like having all IP addresses on the same OS
and using OutboundBindAddress , routing, etc... to separate them? If I
know more, I might be able to make a script specific to your set up.

On 12/3/2022 2:07 PM, Christopher Sheats wrote:
Hello,

Thank you for this information. After 24-hours of testing, these
configurations brought Tor to a halt.

At first I started with the sysctl modifications. After a few hours
with just that, there was no improvement in ~75%
inet_csk_bind_conflict utilization. I then installed Torutils for both
IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15
Mbps across both servers (40 relays). 16 hours later, Tor dropped
below 2 Mbps.

I've removed all of these new settings and restarted.

--
Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: https://digitalcourage.social/@EmeraldOnion/

On Dec 2, 2022, at 7:30 AM, Chris <tor@xxxxxxxxxxxxxxx> wrote:

Hi,

As I'm sure you've already gathered, your system is maxing out trying to
deal with all the connection requests. When inet_csk_get_port is called
and the port is found to be occupied then inet_csk_bind_conflict is
called to resolve the conflict. So in normal circumstances you shouldn't
see it in perf top much less at 79%. There are two ways to deal with it,
and each method should be complimented by the other. One way is to try
to increase the number of ports and reduce the wait time which you have
somehow tried. I would add the following:

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

The complimentary method to the above is to lower the number of
connection requests by removing the frivolous connection requests out of
the equation using a few iptables rules.

I'm assuming the increased load you're experiencing is due to the
current DDos attacks and I'm not sure if you're using anything to
mitigate that but you should consider it.

You may find something useful at the following links

[1](https://github.com/Enkidu-6/tor-ddos)

[2](https://github.com/toralf/torutils)

[background](https://gitlab.torproject.org/tpo/community/support/-/issues/40093)

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:
Hello tor-relays,

We are using Ubuntu server currently for our exit relays.
Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps
and the only observable data point that we have is a significant
increase in inet_csk_bind_conflict, as seen via 'perf top', where it
will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf
settings:
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same
problem, at the same time. They are AMD Epyc 7402P bare-metal servers
each with 96GB RAM, each has 20 exit relays on them. This issue
persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared
here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

--
Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: https://digitalcourage.social/@EmeraldOnion/

_______________________________________________
tor-relays mailing list
tor-relays@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________ tor-relays mailing list tor-relays@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays