[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #33785 [Internal Services/Tor Sysadmin Team]: cannot create new machines in ganeti cluster
#33785: cannot create new machines in ganeti cluster
-------------------------------------------------+-------------------------
Reporter: anarcat | Owner: anarcat
Type: defect | Status:
| assigned
Priority: High | Milestone:
Component: Internal Services/Tor Sysadmin Team | Version:
Severity: Major | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by anarcat):
some feedback from a ganeti maintainer:
{{{
03:40:48 <apoikos> failure reasons: FailMem: 1, FailN1: 4
03:41:18 <apoikos> part indicates that there's no N+1 redundancy, probably
due to not enough memory being available on the cluster to accommodate it
03:42:05 <apoikos> You can try a manual allocation, or passing flags like
--ignore-soft-errors and --no-capacity-checks to hail
[...]
10:36:12 <apoikos> I doubt rebalancing will fix it
10:36:31 <apoikos> The thing is, the whole htools logic was built around
Xen which does hard commit on memory
[...]
10:37:08 <apoikos> That's the -14GB of RAm you're seeing
10:37:11 <anarchat> so what you're saying is that i *am* effectively using
too much memory
10:37:13 <anarchat> oh weird
10:37:24 <anarchat> like the memory use from /proc doesn't match what
ganeti expects?
10:37:28 <apoikos> no, I'm saying you're using less memory than Ganeti
thinks
10:37:34 <apoikos> exactly
10:37:41 <apoikos> because KVMs VSZ != RSS
[...]
10:38:17 <apoikos> Let's say it computes the worst-case scenario
10:38:46 <apoikos> And in the worst-case scenario, where each instance
will indeed use all of its configured memory and KSM won't save you, you
don't have N+1
10:39:08 <apoikos> As for the 162GB of disk, these are probably your root
LVs, if they live on the same LVM VG as the Ganeti instance disks
10:39:39 <anarchat> well there's also a secondary VG (vg_ganeti_hdd) for
spinning rust that we don't see in gnt-node-list
10:39:48 <anarchat> i wonder if that's related
10:39:52 <apoikos> nope
10:40:08 <apoikos> If your primary VG has anything else than Ganeti VMs on
it, you'll see that message
10:40:20 <anarchat> darn
10:40:27 <anarchat> so i'd need to rebuild my nodes to fix this
10:40:34 <apoikos> the good news is, you can tell ganeti to ignore
specific LVs using gnt-cluster modify --reserved-lvs
10:40:41 <anarchat> oh cool
10:41:19 <anarchat> so i'd ignore what... vg_ganeti/root and
vg_ganeti/swap i guess
10:41:29 <apoikos> I guess
10:41:50 <apoikos> The option --reserved-lvs specifies a list (comma-
separated) of logical volume group names (regular expressions) that will
be ignored by the
cluster verify operation
10:41:53 <anarchat> i alreayd have lvm reserved volumes: vg_ganeti/root,
vg_ganeti/swap
10:42:19 <anarchat> oh but maybe i have extra LVs on those nodes, that's
true
10:43:47 <anarchat> on fsn-node-03 and fsn-node-05, but not fsn-node-04
}}}
they also noted [https://github.com/ganeti/ganeti/issues/1399 upstream
issue 1399] which is that the Sinst field is incorrect in `gnt-node list`.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33785#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs