We received our new VM host today, so the first thing I wanted to try was live migration. I started with the virsh migrate –help command, then looked up the official documentation to find out what should I exactly do.
What command to run?
As of today, the official libvirt: Guest migration page is a bit outdated and has a lot of different transport modes and configuration available. To be honest, I do not understand the libvirt migration models, and I did not spend much time trying to understand it.
I want to migrate a VM without any interruption and without using any shared storage. So basically the new host will receive a running virtual machine that it never heard about, neither does it have the associated virtual disks. Apart from this, I want the persistent VM to remain persistent on the receiving host.
The command I came up with is:
virsh migrate --live --persistent --undefinesource --copy-storage-all \ --verbose --desturi <DESTINATION> <VM>
Where <DESTINATION> is the receiving host and <VM> is the name of the virtual machine to be migrated. I entered this command on the current VM host – as I do not want dependency on any client machine.
First problem: copying the disks
After issuing the migrate command, I got the following error:
Failed to open file '/ssd/vmstorage/migratetest2.swap': No such file or directory
Strange, as I enabled –copy-storage-all , so I thought the help (saying “migration with non-shared storage with full disk copy”) did not lie. The log files did not say anything about this, so I started reading old and not useful comments, posts and mailing list archives. There was one useful hint I though I could try:
Create an empty disk with the same geometry before migrating
You can view the disk image information with qemu-img info:
# qemu-img info /ssd/vmstorage/migratetest2.swap image: /ssd/vmstorage/migratetest2.swap file format: raw virtual size: 10G (10737418240 bytes) disk size: 8.0K # qemu-img info /ssd/vmstorage/migratetest2.raw image: /ssd/vmstorage/migratetest2.raw file format: raw virtual size: 10G (10737418240 bytes) disk size: 1.5G
Now I could have created two 10737418240 bytes large images, but I used the same sparse file method I used when I created the original images:
# dd if=/dev/null of=/ssd/vmstorage/migratetest2.swap bs=1M seek=10240 0+0 records in 0+0 records out 0 bytes (0 B) copied, 2,223e-05 s, 0,0 kB/s # dd if=/dev/null of=/ssd/vmstorage/migratetest2.raw bs=1M seek=10240 0+0 records in 0+0 records out 0 bytes (0 B) copied, 2,4834e-05 s, 0,0 kB/s # l -h /ssd/vmstorage/migratetest2* -rw-r--r-- 1 root root 10G szept 13 22:07 /ssd/vmstorage/migratetest2.raw -rw-r--r-- 1 root root 10G szept 13 22:07 /ssd/vmstorage/migratetest2.swap # du /ssd/vmstorage/migratetest2.raw 0 /ssd/vmstorage/migratetest2.raw # qemu-img info /ssd/vmstorage/migratetest2.raw image: /ssd/vmstorage/migratetest2.raw file format: raw virtual size: 10G (10737418240 bytes) disk size: 0
As you can see, I have ended up with two 0 byte large disk images, that are really 10GB long and qemu-img do recognise them as 10 GB disks. Let’s try the migration again:
# virsh migrate --live --persistent --undefinesource --copy-storage-all \ --verbose --desturi <DESTINATION> migratetest2 Migration: [100 %]
After checking the running machines list, it did seem to be working.
How good is this libvirt + qemu + kvm migration?
Network configuration
The VM has two network interfaces:
eth0 is connected to a host internal network with DHCP and NAT, so the machines can access the internet without any special configuration. This network has the same name on all hosts and the same IP configuration, so a migration can only cause problems if two machines use the same DHCP address. I assign static addresses to “long life” machines for this reason.
eth1 is connected to a bridge interface on the host that in turn is connected to a switch with all the VM hosts plugged in. This network also has the same name and configuration on all hosts with strict static IP addresses. This way, all the VMs should see each other no matter on what host they are running.
Did anyone notice the migration?
I wanted to see two things:
- How many pings do we lose while migrating?
- What does the VM see while migrating?
Well, it turned out pretty good. Pinging the machine from a third computer (migration started at the 15th and ended at the 34th package):
> ping xx.xx.xx.xx PING xx.xx.xx.xx (xx.xx.xx.xx) 56(84) bytes of data. 64 bytes from xx.xx.xx.xx: icmp_seq=1 ttl=64 time=0.618 ms 64 bytes from xx.xx.xx.xx: icmp_seq=2 ttl=64 time=0.620 ms 64 bytes from xx.xx.xx.xx: icmp_seq=3 ttl=64 time=0.589 ms 64 bytes from xx.xx.xx.xx: icmp_seq=4 ttl=64 time=0.502 ms 64 bytes from xx.xx.xx.xx: icmp_seq=5 ttl=64 time=0.500 ms 64 bytes from xx.xx.xx.xx: icmp_seq=6 ttl=64 time=0.628 ms 64 bytes from xx.xx.xx.xx: icmp_seq=7 ttl=64 time=0.662 ms 64 bytes from xx.xx.xx.xx: icmp_seq=8 ttl=64 time=0.664 ms 64 bytes from xx.xx.xx.xx: icmp_seq=9 ttl=64 time=0.374 ms 64 bytes from xx.xx.xx.xx: icmp_seq=10 ttl=64 time=0.596 ms 64 bytes from xx.xx.xx.xx: icmp_seq=11 ttl=64 time=0.540 ms 64 bytes from xx.xx.xx.xx: icmp_seq=12 ttl=64 time=0.553 ms 64 bytes from xx.xx.xx.xx: icmp_seq=13 ttl=64 time=0.543 ms 64 bytes from xx.xx.xx.xx: icmp_seq=14 ttl=64 time=0.542 ms 64 bytes from xx.xx.xx.xx: icmp_seq=15 ttl=64 time=0.474 ms 64 bytes from xx.xx.xx.xx: icmp_seq=16 ttl=64 time=0.576 ms 64 bytes from xx.xx.xx.xx: icmp_seq=17 ttl=64 time=0.494 ms 64 bytes from xx.xx.xx.xx: icmp_seq=18 ttl=64 time=0.600 ms 64 bytes from xx.xx.xx.xx: icmp_seq=19 ttl=64 time=0.653 ms 64 bytes from xx.xx.xx.xx: icmp_seq=20 ttl=64 time=0.610 ms 64 bytes from xx.xx.xx.xx: icmp_seq=21 ttl=64 time=0.349 ms 64 bytes from xx.xx.xx.xx: icmp_seq=22 ttl=64 time=0.618 ms 64 bytes from xx.xx.xx.xx: icmp_seq=23 ttl=64 time=0.536 ms 64 bytes from xx.xx.xx.xx: icmp_seq=24 ttl=64 time=0.548 ms 64 bytes from xx.xx.xx.xx: icmp_seq=25 ttl=64 time=0.497 ms 64 bytes from xx.xx.xx.xx: icmp_seq=26 ttl=64 time=0.617 ms 64 bytes from xx.xx.xx.xx: icmp_seq=27 ttl=64 time=0.640 ms 64 bytes from xx.xx.xx.xx: icmp_seq=28 ttl=64 time=0.471 ms 64 bytes from xx.xx.xx.xx: icmp_seq=29 ttl=64 time=0.539 ms 64 bytes from xx.xx.xx.xx: icmp_seq=30 ttl=64 time=0.507 ms 64 bytes from xx.xx.xx.xx: icmp_seq=31 ttl=64 time=0.529 ms 64 bytes from xx.xx.xx.xx: icmp_seq=32 ttl=64 time=0.709 ms 64 bytes from xx.xx.xx.xx: icmp_seq=33 ttl=64 time=0.609 ms 64 bytes from xx.xx.xx.xx: icmp_seq=34 ttl=64 time=0.729 ms 64 bytes from xx.xx.xx.xx: icmp_seq=35 ttl=64 time=0.697 ms 64 bytes from xx.xx.xx.xx: icmp_seq=36 ttl=64 time=0.658 ms 64 bytes from xx.xx.xx.xx: icmp_seq=37 ttl=64 time=0.652 ms 64 bytes from xx.xx.xx.xx: icmp_seq=38 ttl=64 time=0.720 ms 64 bytes from xx.xx.xx.xx: icmp_seq=39 ttl=64 time=0.574 ms 64 bytes from xx.xx.xx.xx: icmp_seq=40 ttl=64 time=0.681 ms 64 bytes from xx.xx.xx.xx: icmp_seq=41 ttl=64 time=0.482 ms 64 bytes from xx.xx.xx.xx: icmp_seq=42 ttl=64 time=0.722 ms 64 bytes from xx.xx.xx.xx: icmp_seq=43 ttl=64 time=0.642 ms 64 bytes from xx.xx.xx.xx: icmp_seq=44 ttl=64 time=0.701 ms 64 bytes from xx.xx.xx.xx: icmp_seq=45 ttl=64 time=0.751 ms 64 bytes from xx.xx.xx.xx: icmp_seq=46 ttl=64 time=0.624 ms 64 bytes from xx.xx.xx.xx: icmp_seq=47 ttl=64 time=0.634 ms 64 bytes from xx.xx.xx.xx: icmp_seq=48 ttl=64 time=0.602 ms 64 bytes from xx.xx.xx.xx: icmp_seq=49 ttl=64 time=0.669 ms 64 bytes from xx.xx.xx.xx: icmp_seq=50 ttl=64 time=0.634 ms 64 bytes from xx.xx.xx.xx: icmp_seq=51 ttl=64 time=0.644 ms 64 bytes from xx.xx.xx.xx: icmp_seq=52 ttl=64 time=0.683 ms 64 bytes from xx.xx.xx.xx: icmp_seq=53 ttl=64 time=0.734 ms 64 bytes from xx.xx.xx.xx: icmp_seq=54 ttl=64 time=0.623 ms ^C --- xx.xx.xx.xx ping statistics --- 54 packets transmitted, 54 received, 0% packet loss, time 53002ms rtt min/avg/max/mdev = 0.349/0.599/0.751/0.088 ms
Running a little script on the machine being migrated (migration started at the 44th and finished at the 2nd second):
migratetest2:~ # while true; do date -Iseconds; sleep 1; done | tee migrate.log 2013-09-13T22:48:27+0200 2013-09-13T22:48:28+0200 2013-09-13T22:48:29+0200 2013-09-13T22:48:30+0200 2013-09-13T22:48:31+0200 2013-09-13T22:48:32+0200 2013-09-13T22:48:33+0200 2013-09-13T22:48:34+0200 2013-09-13T22:48:35+0200 2013-09-13T22:48:36+0200 2013-09-13T22:48:37+0200 2013-09-13T22:48:38+0200 2013-09-13T22:48:39+0200 2013-09-13T22:48:40+0200 2013-09-13T22:48:41+0200 2013-09-13T22:48:42+0200 2013-09-13T22:48:43+0200 2013-09-13T22:48:44+0200 2013-09-13T22:48:45+0200 2013-09-13T22:48:46+0200 2013-09-13T22:48:47+0200 2013-09-13T22:48:48+0200 2013-09-13T22:48:49+0200 2013-09-13T22:48:50+0200 2013-09-13T22:48:51+0200 2013-09-13T22:48:52+0200 2013-09-13T22:48:53+0200 2013-09-13T22:48:54+0200 2013-09-13T22:48:55+0200 2013-09-13T22:48:56+0200 2013-09-13T22:48:57+0200 2013-09-13T22:48:58+0200 2013-09-13T22:48:59+0200 2013-09-13T22:49:00+0200 2013-09-13T22:49:01+0200 2013-09-13T22:49:02+0200 2013-09-13T22:49:03+0200 2013-09-13T22:49:04+0200 2013-09-13T22:49:05+0200 2013-09-13T22:49:06+0200 2013-09-13T22:49:07+0200 2013-09-13T22:49:08+0200 2013-09-13T22:49:09+0200 2013-09-13T22:49:10+0200 2013-09-13T22:49:11+0200 ^C migratetest2:~ #
Please note, that this is an “easy job” as the infrastructure is pretty fast and the test virtual machine was not doing any work at all (except running the above script). The hosts have professional SSDs in hardware RAID 0, direct Gigabit ethernet connection and not much load for the blazing fast processors.
Additional random error messages you may encounter
The receiving host can not satisfy your CPU configuration needs
Although the two VM hosts has the exact same hardware configuration, I got and error that stated that the receiving host did not have the capability to provide the exact same CPU configuration to the VM.
I could not reproduce the problem, but I did solve it when it came up. I just had to replace the exotic CPU configuration the VM had:
<cpu mode='custom' match='exact'> <model fallback='allow'>SandyBridge</model> <vendor>Intel</vendor> <feature policy='require' name='pbe'/> <feature policy='require' name='tm2'/> <feature policy='require' name='est'/> <feature policy='require' name='vmx'/> <feature policy='require' name='osxsave'/> <feature policy='require' name='smx'/> <feature policy='require' name='ss'/> <feature policy='require' name='ds'/> <feature policy='require' name='vme'/> <feature policy='require' name='dtes64'/> <feature policy='require' name='ht'/> <feature policy='require' name='dca'/> <feature policy='require' name='pcid'/> <feature policy='require' name='tm'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='acpi'/> <feature policy='require' name='monitor'/> </cpu>
with plain kvm:
<cpu match='exact'> <model fallback='allow'>kvm64</model> <topology sockets='1' cores='1' threads='1'/> </cpu>
or an even more basic setup:
<cpu> <topology sockets='1' cores='1' threads='1'/> </cpu>
End of file from monitor and not processing incoming migration
These errors came up randomly, and I could not figure out anything about them. They both disappeared after I decided to set up the receiving VM host completely (networks, kvm kernel module, etc.) before playing around with migration like a child who can not wait till dessert.
Good luck with migration! :)