Current state
unique:~# virsh list --all Id Name State ---------------------------------- 19 proxy running 47 dedipoker running 48 lamp running 49 pwm running 52 pokersource02 running unique:~# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert civicrm all -wi-ao 10.00G REMOVE civicrm-drbd all -wi-ao 128.00M dedipoker all -wi-ao 10.00G dedipoker-copy all -wi-a- 10.00G REMOVE dedipoker-drbd all -wi-ao 128.00M drupal all -wi-ao 10.00G KEEP (check drupal.pokersource.info) drupal-drbd all -wi-ao 128.00M home all -wi-ao 50.00G REMOVE lamp all -wi-ao 5.00G lenny all -wi-a- 1.00G opensocial all -wi-ao 5.00G KEEP (if check social.pokersource.info otherwise ask) opensocial-drbd all -wi-ao 128.00M pokersource02 all -wi-ao 10.74G proxy all -wi-ao 1.00G pwm all -wi-ao 10.00G
memory used :
- proxy 256 (could be set to 128)
- lamp 128
- pwm 512
- dedipoker 512
- pokersource02 256
total : 384Mo
Note : pwm is in full virt, others are in paravirt.
Only lamp and pokersource02 (production server : pokermaniaworld.com;, but static datas) need to be migrated.
backups
- archive pwm as pwm-2009-10, dedipoker as p4m-2009-10 on z2-7
z2-7:~# ssh unique.tld "dd if=/dev/all/dedipoker" | dd of=/dev/all/p4m-2009-10
- backup pwm, p4m, lamp and pokersource02 on rosiers
gw:/mnt/pwm/2009-09-21-pwm.unique.tld# nohup rsync --delete -avHz --numeric-ids --exclude=*.iso --exclude=*.img --exclude=/proc --exclude=/sys --link-dest=/mnt/pwm/2009-09-21-pwm.unique.tld/ root@pwm.unique.tld:/ /mnt/pwm/2009-11-20-pwm.unique.tld/ > /home/loic/pwm.out 2>&1 gw:/mnt# nohup rsync --delete -avHz --numeric-ids --exclude=*.iso --exclude=*.img --exclude=/proc --exclude=/sys -e "ssh -p 22001" root@unique.pokersource.info:/mnt/lamp/ /mnt/lamp/2009-11-20-lamp.unique.tld/ > /home/loic/lamp.out 2>&1 some errors : rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/changelog.Debian.gz" failed: Stale NFS file handle (116) rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/copyright" failed: Stale NFS file handle (116) rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/README" failed: Stale NFS file handle (116) rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/NEWS.gz" failed: Stale NFS file handle (116) rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/AUTHORS" failed: Stale NFS file handle (116) IO error encountered -- skipping file deletion unique:~# umount /mnt/lamp/ unique:~# fsck /dev/all/lamp fsck 1.41.3 (12-Oct-2008) e2fsck 1.41.3 (12-Oct-2008) /dev/all/lamp: clean, 124152/327680 files, 699775/1310720 blocks unique:~# fsck -f -y /dev/all/lamp fsck 1.41.3 (12-Oct-2008) e2fsck 1.41.3 (12-Oct-2008) Pass 1: Checking inodes, blocks, and sizes Inodes that were part of a corrupted orphan linked list found. Fix? yes Inode 188656 was part of the orphaned inode list. FIXED. Inode 188658 was part of the orphaned inode list. FIXED. Inode 196625 was part of the orphaned inode list. FIXED. Inode 231576 was part of the orphaned inode list. FIXED. Deleted inode 237884 has zero dtime. Fix? yes Inode 239326 was part of the orphaned inode list. FIXED. Inode 239327 was part of the orphaned inode list. FIXED. Pass 2: Checking directory structure Entry 'AUTHORS' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239326. Clear? yes Entry 'NEWS.gz' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239331. Clear? yes Entry 'README' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239327. Clear? yes Entry 'changelog.Debian.gz' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239329. Clear? yes Entry 'copyright' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239330. Clear? yes Pass 3: Checking directory connectivity /lost+found not found. Create? yes Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -779761 -779769 -779778 -(970713--970717) -986822 Fix? yes Free blocks count wrong for group #4 (18106, counted=18107). Fix? yes Free blocks count wrong for group #23 (7549, counted=7552). Fix? yes Free blocks count wrong for group #28 (2800, counted=2954). Fix? yes Free blocks count wrong for group #29 (5, counted=10). Fix? yes Free blocks count wrong for group #30 (4325, counted=4326). Fix? yes Free blocks count wrong (610944, counted=611108). Fix? yes Inode bitmap differences: -188626 -188656 -188658 -196625 -237884 -(238511--238514) -(239326--239327) -(239329--239331) Fix? yes Free inodes count wrong for group #4 (7776, counted=7777). Fix? yes Directories count wrong for group #4 (90, counted=89). Fix? yes Free inodes count wrong for group #23 (5877, counted=5880). Fix? yes Free inodes count wrong for group #24 (7228, counted=7229). Fix? yes Free inodes count wrong for group #27 (1648, counted=1649). Fix? yes Free inodes count wrong for group #28 (4992, counted=4993). Fix? yes Free inodes count wrong for group #29 (2329, counted=2346). Fix? yes Free inodes count wrong (203527, counted=203551). Fix? yes /dev/all/lamp: ***** FILE SYSTEM WAS MODIFIED ***** /dev/all/lamp: 124129/327680 files (3.7% non-contiguous), 699612/1310720 blocks unique:~# mount -o ro /dev/all/lamp /mnt/lamp/ gw:/mnt/lamp# nohup rsync --delete -avHz --numeric-ids --exclude=*.iso --exclude=*.img --exclude=/proc --exclude=/sys --link-dest=/mnt/lamp/2009-11-20-lamp.unique.tld/ -e "ssh -p 22001" root@unique.pokersource.info:/mnt/lamp/ /mnt/lamp/2009-11-21-lamp.unique.tld/ > /home/loic/lamp2.out 2>&1 unique:~# fdisk -l /dev/drbd1 Disk /dev/drbd1: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System unique:~# mount /dev/drbd1 /mnt/dedipoker/ unique:~# ls !$ ls /mnt/dedipoker/ bin boot dev etc home lib lib64 lost+found media mnt opt proc root sbin selinux srv sys tmp usr var gw:/mnt/p4m# mkdir 2009-11-20-dedipoker.unique.tld gw:/mnt/p4m# nohup rsync --delete -avHz --numeric-ids --exclude=*.iso --exclude=*.img --exclude=/proc --exclude=/sys --link-dest=/mnt/p4m/2009-10-17-dedipoker.unique.tld/ -e "ssh -p 22001" root@unique.pokersource.info:/mnt/dedipoker/ /mnt/p4m/2009-11-20-dedipoker.unique.tld/ > /home/loic/dedipoker.out 2>&1 - backup node on rosiers
- shutdown pwm and dedipoker
bandwith : 8Mo/s between z2-3 and unique.tld
Part I : vm migration
Estimated downtimes (only for this part :
- lamp : < 1hour
- pokersource02 : < 5 min (only during ip migration)
lamp
1) Rename proxy-reference instance on z2-3 to proxy3.vm.gnt, be sure its config is shared with other proxies, copy the proxy configuration from proxy.unique.tld to the proxy on z2-3.host.gnt (all vms will go there), reload apache
2) create dhcp + dns entries for :
- lamp.pokersource.vm.gnt : 52:54:a0:61:8a:39
2bis) update dns pokersource config to use z2-31.pokersource.info
3) reload dhcp + dns on z2 cluster :
for i in 1 2 3 4 5 6 7;do echo "WORKING ON $i" ; ssh z2-$i "cd /etc/dhcp3; hg pull; hg update; /etc/init.d/dhcp3-server restart; rm /var/cache/bind/* && /etc/init.d/bind9 restart";done
4) create instance :
gnt-instance add --no-start -t drbd -s 5G -B memory=128M -n z2-3.host.gnt:z2-2.host.gnt -o debootstrap --net 0:mac=52:54:a0:61:8a:39 lamp.pokersource.vm.gnt
5) For the instance :
- mount it :
./mount_instance.sh lamp.pokersource.vm.gnt 11111
6) For lamp instance, virsh shutdown the vm on unique
6bis) from primary node (z2-3) :
ssh unique.tld "dd if=/dev/all/lamp" | dd of=/dev/drbdXX
Estimated dd time, between 30 and 45 min (if bandwith remains 8mo/s)
down drbd devices '''
start the vm and check it works
gnt-instance start lamp.pokersource.vm.gnt
7) check http://lamp.pokersource.info with "ip.of.z2-3.proxy lamp.pokersource.info" in your /etc/hosts
8) update nagios checks
pokersource02
0) Contact pokermaniaworld.com (which address ? ) to explain the migration process (not in detail) and let them know about downtimes, wait for ack.
1) actually , all traffic is using 91.121.30.203 which seems to be the main ip of unique (not failover). In this case :
remap services to failover ip in shorewall ( 94.23.160.57 ), update dns for for pokersource02 (create pmw instead of pwm) ) and wait for dns propagation. Will have no downtime, because both ips will handle / redirect http traffic.
CHECKPOINT : check pmw.pokersource.info works
1bis) update pokermaniaworld dns to use failover IP : 94.23.160.57 , wait for dns propagation (at least 2 days)
Will have no downtime, because both ips will handle / redirect http traffic.
CHECKPOINT : check pokermaniaworkd still works.
2) double check proxy3.vm.gnt has good config for pokermaniaworld.com
2bis) check than a failover slot is available on z2-3 (yes, 2 slots free)
2ter) copy the shorewall config to z2 for the failover ip, reload shorewall.
3) create dhcp + dns entries for :
- pmw.pokersource.vm.gnt : 52:54:74:30:c5:9f
3bis) reload dhcp + dns on z2 cluster :
for i in 1 2 3 4 5 6 7;do echo "WORKING ON $i" ; ssh z2-$i "cd /etc/dhcp3; hg pull; hg update; /etc/init.d/dhcp3-server restart; rm /var/cache/bind/* && /etc/init.d/bind9 restart";done
4) create instance :
gnt-instance add --no-start -t drbd -s 11G -B memory=256M -n z2-3.host.gnt:z2-2.host.gnt -o debootstrap --net 0:mac=52:54:74:30:c5:9f pmw.pokersource.vm.gnt
4bis) update config for pmw :
gnt-instance modify -H kernel_path=,initrd_path= pmw.pokersource.vm.gnt
5) For the instance :
- mount it :
./mount_instance.sh pmw.pokersource.vm.gnt 11112
5bis) ssh on appropriate node (z2-3)
WARN : the service will be unavailable until the ip is migrated for lamp.
6) Create a lvm snapshot for pokersource02 :
lvcreate -L5G -s -npokersource02-snap /dev/all/pokersource02
6bis) from z2-3
ssh unique.tld "dd if=/dev/all/pokersource02-snap" | dd of=/dev/drbdXX
down drbd devices '''
6quintet) start the vm and check it works
gnt-instance start pmw.pokersource.vm.gnt
Check it works by testing http://pmw.pokersource.vm.gnt and http://pokermaniaworld.com after adding "ip.of.z2-3.proxy pokermaniaworld.com" in your /etc/hosts
9) migrate the failover ip on ovh interface
Check services
10) Update needed nagios checks
Part II : server installation
Estimated downtimes (only for this part ):
- lamp : < 10min
- pokersource02 : < 10min
1) reinstall unique from scratch following http://trac.dunnewind.net/dunnewind/wiki/GanetiOspfHowto , will probably be z2-8, will need to add a failover ip !
2) when it is integrated on the cluster, change secondary of all vms except proxy :
gnt-instance replace-disks -n z2-8.host.gnt lamp.pokersource.vm.gnt gnt-instance replace-disks -n z2-8.host.gnt pmw.pokersource.vm.gnt
3) create a new proxy using the exported one (don't forget to define dns/dhcp for it) :
gnt-backup import --src-node=z2-2.host.gnt --src-dir=/var/lib/ganeti/export/proxy-reference.vm.gnt -t drbd -s 10G -B memory=256M -n z2-8.host.gnt:z2-3.host.gnt --net 0:mac=CHANGEME proxy8.vm.gnt
3bis) sync it's config with others and start it 3ter) check new proxy works by adding "z2-8failoverip lamp.pokersource.info" in /etc/hosts and check website works (the same with pmw)
4) switch the IP from z2-3 to z2-8
WARN there will be a little downtime (<5min) during ip migration
5) failover the 2 vms on z2-8 :
gnt-instance failover lamp.pokersource.vm.gnt gnt-instance failover pmw.pokersource.vm.gnt
WARN there will be a little downtime (<5min) during ip migration
In this order, it should have a minimal downtime (only during ip failover + vm failover) because the new proxy will forward requests
Test everything works as expected.