Current state

unique:~# virsh list --all
 Id Name                 State
----------------------------------
 19 proxy                running
 47 dedipoker            running
 48 lamp                 running
 49 pwm                  running
 52 pokersource02        running
unique:~# lvs
  LV              VG   Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  civicrm         all  -wi-ao  10.00G           REMOVE                          
  civicrm-drbd    all  -wi-ao 128.00M                                      
  dedipoker       all  -wi-ao  10.00G                                      
  dedipoker-copy  all  -wi-a-  10.00G           REMOVE                           
  dedipoker-drbd  all  -wi-ao 128.00M                                      
  drupal          all  -wi-ao  10.00G           KEEP    (check drupal.pokersource.info)                        
  drupal-drbd     all  -wi-ao 128.00M                                      
  home            all  -wi-ao  50.00G           REMOVE                         
  lamp            all  -wi-ao   5.00G                                      
  lenny           all  -wi-a-   1.00G                                      
  opensocial      all  -wi-ao   5.00G           KEEP (if check social.pokersource.info otherwise ask)                        
  opensocial-drbd all  -wi-ao 128.00M                                      
  pokersource02   all  -wi-ao  10.74G                                      
  proxy           all  -wi-ao   1.00G                                      
  pwm             all  -wi-ao  10.00G       

memory used :

  • proxy 256 (could be set to 128)
  • lamp 128
  • pwm 512
  • dedipoker 512
  • pokersource02 256

total : 384Mo

Note : pwm is in full virt, others are in paravirt.

Only lamp and pokersource02 (production server : pokermaniaworld.com;, but static datas) need to be migrated.

backups

  • archive pwm as pwm-2009-10, dedipoker as p4m-2009-10 on z2-7
    z2-7:~# ssh unique.tld "dd if=/dev/all/dedipoker" | dd of=/dev/all/p4m-2009-10 
    
  • backup pwm, p4m, lamp and pokersource02 on rosiers
    gw:/mnt/pwm/2009-09-21-pwm.unique.tld# nohup rsync --delete -avHz --numeric-ids --exclude=*.iso --exclude=*.img --exclude=/proc --exclude=/sys  --link-dest=/mnt/pwm/2009-09-21-pwm.unique.tld/ root@pwm.unique.tld:/ /mnt/pwm/2009-11-20-pwm.unique.tld/ > /home/loic/pwm.out 2>&1
    gw:/mnt# nohup rsync --delete -avHz --numeric-ids --exclude=*.iso --exclude=*.img --exclude=/proc --exclude=/sys   -e "ssh -p 22001" root@unique.pokersource.info:/mnt/lamp/ /mnt/lamp/2009-11-20-lamp.unique.tld/ > /home/loic/lamp.out 2>&1
    some errors : rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/changelog.Debian.gz" failed: Stale NFS file handle (116)
    rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/copyright" failed: Stale NFS file handle (116)
    rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/README" failed: Stale NFS file handle (116)
    rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/NEWS.gz" failed: Stale NFS file handle (116)
    rsync: readlink "/mnt/lamp/usr/share/doc/python-poker-engine/AUTHORS" failed: Stale NFS file handle (116)
    IO error encountered -- skipping file deletion
    
    unique:~# umount /mnt/lamp/
    unique:~# fsck /dev/all/lamp 
    fsck 1.41.3 (12-Oct-2008)
    e2fsck 1.41.3 (12-Oct-2008)
    /dev/all/lamp: clean, 124152/327680 files, 699775/1310720 blocks
    unique:~# fsck -f -y /dev/all/lamp 
    fsck 1.41.3 (12-Oct-2008)
    e2fsck 1.41.3 (12-Oct-2008)
    Pass 1: Checking inodes, blocks, and sizes
    Inodes that were part of a corrupted orphan linked list found.  Fix? yes
    
    Inode 188656 was part of the orphaned inode list.  FIXED.
    Inode 188658 was part of the orphaned inode list.  FIXED.
    Inode 196625 was part of the orphaned inode list.  FIXED.
    Inode 231576 was part of the orphaned inode list.  FIXED.
    Deleted inode 237884 has zero dtime.  Fix? yes
    
    Inode 239326 was part of the orphaned inode list.  FIXED.
    Inode 239327 was part of the orphaned inode list.  FIXED.
    Pass 2: Checking directory structure
    Entry 'AUTHORS' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239326.  Clear? yes
    
    Entry 'NEWS.gz' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239331.  Clear? yes
    
    Entry 'README' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239327.  Clear? yes
    
    Entry 'changelog.Debian.gz' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239329.  Clear? yes
    
    Entry 'copyright' in /usr/share/doc/python-poker-engine (238639) has deleted/unused inode 239330.  Clear? yes
    
    Pass 3: Checking directory connectivity
    /lost+found not found.  Create? yes
    
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    Block bitmap differences:  -779761 -779769 -779778 -(970713--970717) -986822
    Fix? yes
    
    Free blocks count wrong for group #4 (18106, counted=18107).
    Fix? yes
    
    Free blocks count wrong for group #23 (7549, counted=7552).
    Fix? yes
    
    Free blocks count wrong for group #28 (2800, counted=2954).
    Fix? yes
    
    Free blocks count wrong for group #29 (5, counted=10).
    Fix? yes
    
    Free blocks count wrong for group #30 (4325, counted=4326).
    Fix? yes
    
    Free blocks count wrong (610944, counted=611108).
    Fix? yes
    
    Inode bitmap differences:  -188626 -188656 -188658 -196625 -237884 -(238511--238514) -(239326--239327) -(239329--239331)
    Fix? yes
    
    Free inodes count wrong for group #4 (7776, counted=7777).
    Fix? yes
    
    Directories count wrong for group #4 (90, counted=89).
    Fix? yes
    
    Free inodes count wrong for group #23 (5877, counted=5880).
    Fix? yes
    
    Free inodes count wrong for group #24 (7228, counted=7229).
    Fix? yes
    
    Free inodes count wrong for group #27 (1648, counted=1649).
    Fix? yes
    
    Free inodes count wrong for group #28 (4992, counted=4993).
    Fix? yes
    
    Free inodes count wrong for group #29 (2329, counted=2346).
    Fix? yes
    
    Free inodes count wrong (203527, counted=203551).
    Fix? yes
    
    
    /dev/all/lamp: ***** FILE SYSTEM WAS MODIFIED *****
    /dev/all/lamp: 124129/327680 files (3.7% non-contiguous), 699612/1310720 blocks
    unique:~# mount -o ro /dev/all/lamp /mnt/lamp/
    gw:/mnt/lamp#  nohup rsync --delete -avHz --numeric-ids --exclude=*.iso --exclude=*.img --exclude=/proc --exclude=/sys --link-dest=/mnt/lamp/2009-11-20-lamp.unique.tld/   -e "ssh -p 22001" root@unique.pokersource.info:/mnt/lamp/ /mnt/lamp/2009-11-21-lamp.unique.tld/ > /home/loic/lamp2.out 2>&1
    
    unique:~# fdisk -l /dev/drbd1
    
    Disk /dev/drbd1: 10.7 GB, 10737418240 bytes
    255 heads, 63 sectors/track, 1305 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    Disk identifier: 0x00000000
    
          Device Boot      Start         End      Blocks   Id  System
    unique:~# mount /dev/drbd1 /mnt/dedipoker/
    unique:~# ls !$
    ls /mnt/dedipoker/
    bin  boot  dev  etc  home  lib  lib64  lost+found  media  mnt  opt  proc  root  sbin  selinux  srv  sys  tmp  usr  var
    gw:/mnt/p4m# mkdir 2009-11-20-dedipoker.unique.tld
    gw:/mnt/p4m#  nohup rsync --delete -avHz --numeric-ids --exclude=*.iso --exclude=*.img --exclude=/proc --exclude=/sys --link-dest=/mnt/p4m/2009-10-17-dedipoker.unique.tld/   -e "ssh -p 22001" root@unique.pokersource.info:/mnt/dedipoker/ /mnt/p4m/2009-11-20-dedipoker.unique.tld/ > /home/loic/dedipoker.out 2>&1
    
    
  • backup node on rosiers
  • shutdown pwm and dedipoker

bandwith : 8Mo/s between z2-3 and unique.tld

Part I : vm migration

Estimated downtimes (only for this part :

  • lamp : < 1hour
  • pokersource02 : < 5 min (only during ip migration)

lamp

1) Rename proxy-reference instance on z2-3 to proxy3.vm.gnt, be sure its config is shared with other proxies, copy the proxy configuration from proxy.unique.tld to the proxy on z2-3.host.gnt (all vms will go there), reload apache

2) create dhcp + dns entries for :

  • lamp.pokersource.vm.gnt : 52:54:a0:61:8a:39

2bis) update dns pokersource config to use z2-31.pokersource.info

3) reload dhcp + dns on z2 cluster :

for i in 1 2 3 4 5 6 7;do echo "WORKING ON $i" ; ssh z2-$i "cd /etc/dhcp3; hg pull; hg update; /etc/init.d/dhcp3-server restart; rm /var/cache/bind/* && /etc/init.d/bind9 restart";done

4) create instance :

 gnt-instance add --no-start -t drbd -s 5G -B memory=128M -n z2-3.host.gnt:z2-2.host.gnt -o debootstrap --net 0:mac=52:54:a0:61:8a:39  lamp.pokersource.vm.gnt

5) For the instance :

  • mount it :
    ./mount_instance.sh lamp.pokersource.vm.gnt 11111
    

6) For lamp instance, virsh shutdown the vm on unique

6bis) from primary node (z2-3) :

ssh unique.tld "dd if=/dev/all/lamp" | dd of=/dev/drbdXX

Estimated dd time, between 30 and 45 min (if bandwith remains 8mo/s)

down drbd devices '''

start the vm and check it works

gnt-instance start lamp.pokersource.vm.gnt

7) check http://lamp.pokersource.info with "ip.of.z2-3.proxy lamp.pokersource.info" in your /etc/hosts

8) update nagios checks

pokersource02

0) Contact pokermaniaworld.com (which address ? ) to explain the migration process (not in detail) and let them know about downtimes, wait for ack.

1) actually , all traffic is using 91.121.30.203 which seems to be the main ip of unique (not failover). In this case :

remap services to failover ip in shorewall ( 94.23.160.57 ), update dns for for pokersource02 (create pmw instead of pwm) ) and wait for dns propagation. Will have no downtime, because both ips will handle / redirect http traffic.

CHECKPOINT : check pmw.pokersource.info works

1bis) update pokermaniaworld dns to use failover IP : 94.23.160.57 , wait for dns propagation (at least 2 days)

Will have no downtime, because both ips will handle / redirect http traffic.

CHECKPOINT : check pokermaniaworkd still works.

2) double check proxy3.vm.gnt has good config for pokermaniaworld.com

2bis) check than a failover slot is available on z2-3 (yes, 2 slots free)

2ter) copy the shorewall config to z2 for the failover ip, reload shorewall.

3) create dhcp + dns entries for :

  • pmw.pokersource.vm.gnt : 52:54:74:30:c5:9f

3bis) reload dhcp + dns on z2 cluster :

for i in 1 2 3 4 5 6 7;do echo "WORKING ON $i" ; ssh z2-$i "cd /etc/dhcp3; hg pull; hg update; /etc/init.d/dhcp3-server restart; rm /var/cache/bind/* && /etc/init.d/bind9 restart";done

4) create instance :

 gnt-instance add --no-start -t drbd -s 11G -B memory=256M -n z2-3.host.gnt:z2-2.host.gnt -o debootstrap --net 0:mac=52:54:74:30:c5:9f  pmw.pokersource.vm.gnt

4bis) update config for pmw :

gnt-instance modify -H kernel_path=,initrd_path= pmw.pokersource.vm.gnt

5) For the instance :

  • mount it :
    ./mount_instance.sh pmw.pokersource.vm.gnt 11112
    

5bis) ssh on appropriate node (z2-3)

WARN : the service will be unavailable until the ip is migrated for lamp.

6) Create a lvm snapshot for pokersource02 :

lvcreate -L5G -s -npokersource02-snap /dev/all/pokersource02

6bis) from z2-3

ssh unique.tld "dd if=/dev/all/pokersource02-snap" | dd of=/dev/drbdXX

down drbd devices '''

6quintet) start the vm and check it works

gnt-instance start pmw.pokersource.vm.gnt

Check it works by testing http://pmw.pokersource.vm.gnt and http://pokermaniaworld.com after adding "ip.of.z2-3.proxy pokermaniaworld.com" in your /etc/hosts

9) migrate the failover ip on ovh interface

Check services

10) Update needed nagios checks

Part II : server installation

Estimated downtimes (only for this part ):

  • lamp : < 10min
  • pokersource02 : < 10min

1) reinstall unique from scratch following http://trac.dunnewind.net/dunnewind/wiki/GanetiOspfHowto , will probably be z2-8, will need to add a failover ip !

2) when it is integrated on the cluster, change secondary of all vms except proxy :

gnt-instance replace-disks  -n z2-8.host.gnt lamp.pokersource.vm.gnt
gnt-instance replace-disks  -n z2-8.host.gnt pmw.pokersource.vm.gnt

3) create a new proxy using the exported one (don't forget to define dns/dhcp for it) :

 gnt-backup import --src-node=z2-2.host.gnt --src-dir=/var/lib/ganeti/export/proxy-reference.vm.gnt -t drbd -s 10G -B memory=256M -n z2-8.host.gnt:z2-3.host.gnt   --net 0:mac=CHANGEME proxy8.vm.gnt

3bis) sync it's config with others and start it 3ter) check new proxy works by adding "z2-8failoverip lamp.pokersource.info" in /etc/hosts and check website works (the same with pmw)

4) switch the IP from z2-3 to z2-8

WARN there will be a little downtime (<5min) during ip migration

5) failover the 2 vms on z2-8 :

gnt-instance failover lamp.pokersource.vm.gnt
gnt-instance failover pmw.pokersource.vm.gnt

WARN there will be a little downtime (<5min) during ip migration

In this order, it should have a minimal downtime (only during ip failover + vm failover) because the new proxy will forward requests

Test everything works as expected.