[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #33406 [Internal Services/Tor Sysadmin Team]: automate reboots
#33406: automate reboots
-------------------------------------------------+---------------------
Reporter: anarcat | Owner: tpa
Type: project | Status: new
Priority: Low | Milestone:
Component: Internal Services/Tor Sysadmin Team | Version:
Severity: Major | Resolution:
Keywords: tpa-roadmap-march | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------------------------+---------------------
Comment (by anarcat):
i did more work on the reboot procedures today, and rebooted the ganeti
cluster using the reboot command. there were some issues with the initrd
interfering with the `wait_for_boot` (now called `wait_for_ping`) checks
so I did some refactoring, but i'm still confused about the exception
that's raised by Fabric in this case.
the exception I got here is:
{{{
All instances migrated successfully.
Shutdown scheduled for Thu 2020-04-02 18:30:55 UTC, use 'shutdown -c'
to cancel.
waiting 0 minutes for reboot to happen
waiting up to 30 seconds for host to go down
waiting 300 seconds for host to go up
host fsn-node-01.torproject.org should be back online, checking uptime
Traceback (most recent call last):
File "./reboot", line 132, in <module>
logging.getLogger(mod).setLevel('WARNING')
File "./reboot", line 116, in main
delay_up=args.delay_up,
File "/usr/lib/python3/dist-packages/invoke/tasks.py", line 127, in
__call__
result = self.body(*args, **kwargs)
File "/home/anarcat/src/tor/tsa-misc/fabric_tpa/reboot.py", line
197, in shutdown_and_wait
res = con.run('uptime', watchers=[responder], pty=True, warn=True)
File "<decorator-gen-3>", line 2, in run
File "/usr/lib/python3/dist-packages/fabric/connection.py", line 29,
in opens
self.open()
File "/home/anarcat/src/tor/tsa-misc/fabric_tpa/__init__.py", line
106, in safe_open
Connection.open_orig(self)
File "/usr/lib/python3/dist-packages/fabric/connection.py", line
634, in open
self.client.connect(**kwargs)
File "/usr/lib/python3/dist-packages/paramiko/client.py", line 349,
in connect
retry_on_signal(lambda: sock.connect(addr))
File "/usr/lib/python3/dist-packages/paramiko/util.py", line 280, in
retry_on_signal
return function()
File "/usr/lib/python3/dist-packages/paramiko/client.py", line 349,
in <lambda>
retry_on_signal(lambda: sock.connect(addr))
TimeoutError: [Errno 110] Connection timed out
}}}
maybe the exception gets generated *above* our code, in the fabric task
handler itself, in which case it might mean we shouldn't use a @task for
this at all, at least in our code.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33406#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs