[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: ShutdownWaitLength vs. 'restart' in init scripts



On Thu, Jun 25, 2009 at 09:50:16PM -0400, Bill McGonigle wrote:
> > Is this with the tor rpm shipped by Fedora?
> 
> yes
> 
> > We don't support (or use,
> > or like, or recommend) the fedora tor rpm.
> 
> OK.  Is this ideological, or because it's no good?

Mostly the latter.

There was a little issue at the beginning where there were two people
offering to be maintainer, and I had been working with one and he had
a fine rpm spec file, and for some reason they picked the other and
ignored us both. I haven't interacted with them at all since; they seem
happy in their world ignoring upstream. :(

>  I can work to fix
> the latter (I don't think anybody at Fedora wants a bad RPM).  In my
> experience automatic security updates [aside: which are currently borked
> for the Fedora tor package, but that's another story, which I think I've
> gotten the right people to resolve at this point] are worth many
> trade-offs.  People just don't do security updates the way we wish they
> would.

Right. We've been thinking of setting up an rpm.torproject.org
repository, and putting our rpms into it. That would be similar to the
mirror.noreply.org deb repository that our Debian maintainer maintains.

Then we would have better control over what people think of as our rpms.

But the even better answer would be to somehow get the fedora folks
to improve their spec file. It needs to set ulimit -n like the debian
init script does; understand how to shut down relays cleanly; create
a separate user and run Tor as that user; and I really haven't looked
deep enough lately to know what else it's missing.

> > The right answer imo is how the deb package does it:
> > https://git.torproject.org/checkout/tor/master/debian/tor.init
> > Check out the wait_for_deaddaemon function: it basically checks each
> > second whether the process is still around, and returns when it's gone
> > (or 60 seconds have passed).
> 
> this makes sense.
> 
> > So I guess if you raise your ShutdownWaitLength, you'll want to tweak
> > the script. But that still seems better than the
> > "kill -INT, sleep 1, kill -9" strategy the rpm uses.
> 
> agreed, do you see any reason not to extract ShutdownWaitLength from the
> config file?

The main reason against would be 'complexity'.

For example, if you run a controller that 'setconf's a new
ShutdownWaitLength value via the control port, but the Tor process can't
saveconf because it can't write to its torrc (arguably a feature not a
bug), then Tor would be using a different value of ShutdownWaitLength
than you'd find in its torrc file. That's an unlikely edge case, but it
illustrates how it might not be that simple.

I would say that just taking Debian's "up to 60 seconds while it exits"
strategy would get us most of the way there.

> >> or, perhaps even better: fixing the server shutdown process so the old
> >> server can't take out the new server.
> > 
> > Can you clarify what happens here? 'tor stop' finishes but Tor is still
> > running, so then 'tor start' fails to launch a new Tor, and then the
> > old Tor exits, and then you have no Tor running but you think you do?
> 
> OK, so to be more clear:
> 
> Let's call the old tor process we're taking down torA and the new one we
> want torB.
> 
> 1) torA is sent an INT to tell it to stop.  It begins its shutdown process.
> 2) The init script isn't waiting or watching, so it starts torB.
> Because torA is no longer bound to its listener port, torB can start up
> just fine. The init script is out of the picture now.
> 3) torA reaches ShutdownWaitLength time.  It kills itself.  <---guess
> 4) torB gets taken out by torA's final shutdown.
> 
> At this point the init script's lockfiles reflect torB running, when
> actually no tor is running.  So it behaves inappropriately.

Ah. I'm not sure whether that's the exact order of events (I don't think
TorA can kill TorB just by exiting), but in any case it's certainly a
big mess. The thing to do is to make sure that TorA has exited before
you launch TorB, and that has to be done by the init script.

> I realize there's no point in addressing the current Fedora RPM init
> script here, but assuming 4) is correct, it would seem that however torA
> is finding torB, it shouldn't do it that way.

Tor just exits cleanly when it's counted through its ShutdownWaitLength
time. It doesn't go hunting for other instances of Tor to kill them too.

Perhaps TorB dies early on (but after writing its pid file) when it
realizes that TorA is still around?

In any case, launching Tor while Tor is still running is not a supported
operation. We should make it not do that. :)

Can you take the lead in making a patch, and in either getting Fedora to
believe in it, or helping us maintain our own rpms better?

Thanks!
--Roger