belkin.binbang.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) cspoker-bot.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) drac5.binbang.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) esclick.binbang.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) ganeti-manager.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) intel.binbang.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) proxy.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) rheincode.pokersource.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) speedtouch-716g.binbang.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) trac.pokersource.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down) tsunami.vm.gnt kvm debootstrap z2-2.host.gnt ERROR_nodedown (node down)
1) update shorewall config on z2-3 to use z2-21 ip for http proxy
-
params
diff -r c3e11f9f27f1 params
a b 31 31 VM_SAVANNAH=10.10.1.3 32 32 VM_SPEEDTOUCH_716G=10.10.1.14 33 33 IP_SAVANNAH=87.98.156.150 34 34 +IP_FAILOVER_Z21=87.98.243.123 35 35 36 36 VM_Z2WORK=10.10.1.15 37 37 VM_CSPOKER=10.10.1.41
1bis) gnt-cluster masterfailover on z2-3 (to become master)
z2-3:/etc/shorewall# gnt-cluster masterfailover z2-3:/etc/shorewall# gnt-cluster getmaster z2-3.host.gnt
2) start vpn for fsffrance on z2-3, and stop it on z2-2
z2-3:/etc/openvpn# mv fsf-vpn.conf.old fsf-vpn.conf z2-2:/etc/openvpn# mv fsf-vpn.conf fsf-vpn.conf.desactivated z2-2:/etc/openvpn# /etc/init.d/openvpn restart z2-3:/etc/openvpn# /etc/init.d/openvpn restart
check :
z2-2:/etc/openvpn# ip r|grep -v "10.1" 91.121.220.0/24 dev eth0 proto kernel scope link src 91.121.220.73 blackhole 10.0.0.0/8 default via 91.121.220.254 dev eth0 src 87.98.243.123 z2-3:/etc/openvpn# ip r|grep -v "10.1" 192.168.181.16 via 10.8.0.117 dev tun0 10.8.0.117 dev tun0 proto kernel scope link src 10.8.0.118 192.168.29.17 via 10.8.0.117 dev tun0 192.168.5.0/24 via 10.8.0.117 dev tun0 91.121.83.0/24 dev eth0 proto kernel scope link src 91.121.83.129 192.168.35.0/24 via 10.8.0.117 dev tun0 192.168.67.0/24 via 10.8.0.117 dev tun0 192.168.17.0/24 via 10.8.0.117 dev tun0 10.8.0.0/24 via 10.8.0.117 dev tun0 192.168.181.0/24 via 10.8.0.117 dev tun0 192.168.170.0/24 via 10.8.0.117 dev tun0 192.168.14.0/24 via 10.8.0.117 dev tun0 192.168.29.0/24 via 10.8.0.117 dev tun0 192.168.25.0/24 via 10.8.0.117 dev tun0 blackhole 10.0.0.0/8 default via 91.121.83.254 dev eth0 src 87.98.243.39
maxence@call:~$ ssh root@z2-1.host.gnt
[...]
Last login: Wed Nov 25 09:07:07 2009 from 10.8.0.6
z2-1:~# exit
maxence@call:~$ ssh root@proxy2.vm.gnt
[...]
Last login: Wed Nov 25 08:06:42 2009 from 10.8.0.6
proxy2:~#
2) migrate failover ip, check it works
ip migrated, check some service :
3) shutdown all vm (kill needed for rheincode, because even kvm monitor console doesn't answer)
can't kill rheincode, marked as "defunc"
z2-2:/etc/openvpn# ps aux|grep kvm root 7678 0.0 0.0 9384 804 pts/1 S+ 09:33 0:00 grep kvm root 29667 2.2 0.0 0 0 ? Zl Nov12 418:45 [kvm] <defunct>
4) reboot z2-2
reboot hangs, doing hard reboot [...] Last login: Wed Nov 25 08:44:57 2009 from arennes-357-1-112-202.w90-12.abo.wanadoo.fr z2-2:~# uptime 09:37:49 up 0 min, 1 user, load average: 0.05, 0.01, 0.00 z2-2:~#
4bis) when rebooted, restart ganeti
Already restarted :
z2-1.host.gnt 1.3T 608.9G 3.9G 2.4G 1.3G 10 11 z2-2.host.gnt 1.3T 1.0T 3.8G 134M 3.7G 10 11 z2-3.host.gnt 1.3T 1.1T 3.9G 2.4G 1.2G 10 8 z2-4.host.gnt 1.3T 625.0G 7.6G 1.3G 6.7G 5 17 z2-5.host.gnt 1.3T 1.1T 3.9G 2.7G 369M 13 2 z2-6.host.gnt 1.8T 1.7T 11.8G 2.9G 9.4G 4 4 z2-7.host.gnt 911.0G 335.5G 15.7G 9.3G 14.5G 1 0
5) gnt-cluster verify
Wed Nov 25 09:19:12 2009 * Verifying node z2-2.host.gnt (master candidate) Wed Nov 25 09:19:12 2009 - ERROR: file '/var/lib/ganeti/ssconf_master_node' has wrong checksum Wed Nov 25 09:19:12 2009 - ERROR: file '/var/lib/ganeti/config.data' has wrong checksum Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 0 of instance home.binbang.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 1 of instance munin.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 4 of instance trac.pokersource.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 7 of instance z2work.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 10 of instance tsunami.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 11 of instance ligamen.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 13 of instance pioneer.binbang.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 14 of instance drupal-z2.pokersource.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 15 of instance proxy2.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 19 of instance lamp.pokersource.vm.gnt is not active Wed Nov 25 09:19:12 2009 - ERROR: drbd minor 20 of instance harvest.vm.gnt is not active
after a "gnt-cluster redist-conf:
z2-3:/etc/shorewall# gnt-cluster redist-conf z2-3:/etc/shorewall# gnt-cluster verify Wed Nov 25 09:20:54 2009 * Verifying global settings Wed Nov 25 09:20:54 2009 * Gathering data (7 nodes) Wed Nov 25 09:21:01 2009 * Verifying node z2-1.host.gnt (master candidate) Wed Nov 25 09:21:01 2009 * Verifying node z2-2.host.gnt (master candidate) Wed Nov 25 09:21:01 2009 * Verifying node z2-3.host.gnt (master) Wed Nov 25 09:21:01 2009 * Verifying node z2-4.host.gnt (master candidate) Wed Nov 25 09:21:01 2009 * Verifying node z2-5.host.gnt (master candidate) Wed Nov 25 09:21:01 2009 * Verifying node z2-6.host.gnt (master candidate) Wed Nov 25 09:21:01 2009 * Verifying node z2-7.host.gnt (master candidate) Wed Nov 25 09:21:01 2009 - ERROR: unallocated drbd minor 0 is in use Wed Nov 25 09:21:01 2009 - ERROR: unallocated drbd minor 3 is in use
z2-3:/etc/shorewall# gnt-cluster repair-disk-sizes Wed Nov 25 09:41:45 2009 - INFO: Disk 0 of instance booken.binbang.vm.gnt has mismatched size, correcting: recorded 20480, actual 5120 Wed Nov 25 09:41:46 2009 - WARNING: Failure in blockdev_getsizes call to node z2-2.host.gnt, ignoring Wed Nov 25 09:41:46 2009 - INFO: Disk 0 of instance aviosys.binbang.vm.gnt has mismatched size, correcting: recorded 20480, actual 10240 Wed Nov 25 09:41:46 2009 - WARNING: Disk 0 of instance neufbox.binbang.vm.gnt did not return size information, ignoring Wed Nov 25 09:41:47 2009 - WARNING: Disk 0 of instance neufbox-fc.binbang.vm.gnt did not return size information, ignoring Wed Nov 25 09:41:47 2009 - INFO: Disk 0 of instance dtv09ut.binbang.vm.gnt has mismatched size, correcting: recorded 10240, actual 5120 Wed Nov 25 09:41:48 2009 - WARNING: Failure in blockdev_getsizes call to node z2-3.host.gnt, ignoring
6) gnt-remove ganeti-monitor.vm.gnt
z2-3:/etc/shorewall# gnt-instance remove ganeti-manager.vm.gnt This will remove the volumes of the instance ganeti-manager.vm.gnt (including mirrors), thus removing all the data of the instance. Continue? y/[n]/?: y z2-3:/etc/shorewall#7) restart vms( trac.pokersource, rheincode, tsunami, proxy)
z2-3:/etc/shorewall# gnt-instance startup rheincode.pokersource.vm.gnt z2-3:/etc/shorewall# gnt-instance startup trac.pokersource.vm.gnt z2-3:/etc/shorewall# gnt-instance startup tsunami.vm.gnt
8) move failover ip back to z2-2
done
9) check everything is ok, all hosts are back to normal, need to run "replace-disk" for trac.pokersource.vm.gnt and proxy.ligamen.vm.gnt. Both had their secondary on z2-3, primary were resp. on z2-2 and z2-5. The error was : "BlockDeviceError?: blockdev failed (exited with exit code 1): /dev/drbd12: Wrong medium type" during a gnt-cluster repair-dsik-size.
the "Wrong medium type" seems to appear when the underlying volume (here, a lvm lv) is missing, but the error still appears after a vgscan or after a stop/start of the vm...
Some vm were "split-brained", need to resync them with : drbdsetup /dev/drbdXY secondary (on the slave node) drbdsetup /dev/drbdXY invalidate
shown in syslog :
Nov 25 14:41:12 z2-1 kernel: [6070861.392773] block drbd10: helper command: /bin/true split-brain minor-10 Nov 25 14:41:12 z2-1 kernel: [6070861.393235] block drbd10: helper command: /bin/true split-brain minor-10 exit code 0 (0x0)
- tsunami.vm.gnt