[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #30880 [Internal Services/Tor Sysadmin Team]: document backup/restore procedures
#30880: document backup/restore procedures
-------------------------------------------------+-------------------------
Reporter: anarcat | Owner: anarcat
Type: task | Status: assigned
Priority: Medium | Milestone:
Component: Internal Services/Tor Sysadmin | Version:
Team |
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
-------------------------------------------------+-------------------------
Backup system design and restore procedures are currently not well
documented in our wiki. Try a few restores and document the heck out of
this. The [http://opsreportcard.com/section/11 ops report card] recommends
services be documented with a template like this:
1. Overview: Overview of the service: what is it, why do we have it, who
are the primary contacts, how to report bugs, links to design docs and
other relevant information.
2. Build: How to build the software that makes the service. Where to
download it from, where the source code repository is, steps for building
and making a package or other distribution mechanisms. If it is software
that you modify in any way (open source project you contribute to or a
local project) include instructions for how a new developer gets started.
Ideally the end result is a package that can be copied to other machines
for installation.
3. Deploy: How to deploy the software. How to build a server from
scratch: RAM/disk requirements, OS version and configuration, what
packages to install, and so on. If this is automated with a configuration
management tool like cfengine/puppet/chef (and it should be), then say so.
4. Common Tasks: Step-by-step instructions for common things like
provisioning (add/change/delete), common problems and their solutions, and
so on.
5. Pager Playbook: A list of every alert your monitoring system may
generate for this service and a step-by-step "what do to when..." for each
of them.
6. DR: Disaster Recovery Plans and procedure. If a service machine died
how would you fail-over to the hot/cold spare?
7. SLA: Service Level Agreement. The (social or real) contract you make
with your customers. Typically things like Uptime Goal (how many 9s), RPO
(Recovery Point Objective) and RTO (Recovery Time Objective).
While we don't use that template anywhere yet (and it somehow conflicts
with the [https://www.divio.com/blog/documentation/ documentation best
practices], we can probably find a middle ground of some sort...
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/30880>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs