Live migrating a virtual machine with libvirt without a shared storage

We received our new VM host today, so the first thing I wanted to try was live migration. I started with the virsh migrate –help command, then looked up the official documentation to find out what should I exactly do.

What command to run?

As of today, the official libvirt: Guest migration page is a bit outdated and has a lot of different transport modes and configuration available. To be honest, I do not understand the libvirt migration models, and I did not spend much time trying to understand it.

I want to migrate a VM without any interruption and without using any shared storage. So basically the new host will receive a running virtual machine that it never heard about, neither does it have the associated virtual disks. Apart from this, I want the persistent VM to remain persistent on the receiving host.

The command I came up with is:

virsh migrate --live --persistent --undefinesource --copy-storage-all \
    --verbose --desturi <DESTINATION> <VM>

Where <DESTINATION> is the receiving host and <VM> is the name of the virtual machine to be migrated. I entered this command on the current VM host – as I do not want dependency on any client machine.

First problem: copying the disks

After issuing the migrate command, I got the following error:

Failed to open file '/ssd/vmstorage/migratetest2.swap': No such file or directory

Strange, as I enabled –copy-storage-all , so I thought the help (saying “migration with non-shared storage with full disk copy”) did not lie. The log files did not say anything about this, so I started reading old and not useful comments, posts and mailing list archives. There was one useful hint I though I could try:

Create an empty disk with the same geometry before migrating

You can view the disk image information with qemu-img info:

# qemu-img info /ssd/vmstorage/migratetest2.swap
image: /ssd/vmstorage/migratetest2.swap
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: 8.0K
# qemu-img info /ssd/vmstorage/migratetest2.raw
image: /ssd/vmstorage/migratetest2.raw
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: 1.5G

Now I could have created two 10737418240 bytes large images, but I used the same sparse file method I used when I created the original images:

# dd if=/dev/null of=/ssd/vmstorage/migratetest2.swap bs=1M seek=10240
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2,223e-05 s, 0,0 kB/s
# dd if=/dev/null of=/ssd/vmstorage/migratetest2.raw bs=1M seek=10240
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2,4834e-05 s, 0,0 kB/s
# l -h /ssd/vmstorage/migratetest2*
-rw-r--r-- 1 root root 10G szept 13 22:07 /ssd/vmstorage/migratetest2.raw
-rw-r--r-- 1 root root 10G szept 13 22:07 /ssd/vmstorage/migratetest2.swap
# du /ssd/vmstorage/migratetest2.raw 
0    /ssd/vmstorage/migratetest2.raw
# qemu-img info /ssd/vmstorage/migratetest2.raw 
image: /ssd/vmstorage/migratetest2.raw
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: 0

As you can see, I have ended up with two 0 byte large disk images, that are really 10GB long and qemu-img do recognise them as 10 GB disks. Let’s try the migration again:

# virsh migrate --live --persistent --undefinesource --copy-storage-all \
   --verbose --desturi <DESTINATION> migratetest2
Migration: [100 %]

After checking the running machines list, it did seem to be working.

How good is this libvirt + qemu + kvm migration?

Network configuration

The VM has two network interfaces:

eth0 is connected to a host internal network with DHCP and NAT, so the machines can access the internet without any special configuration. This network has the same name on all hosts and the same IP configuration, so a migration can only cause problems if two machines use the same DHCP address. I assign static addresses to “long life” machines for this reason.

eth1 is connected to a bridge interface on the host that in turn is connected to a switch with all the VM hosts plugged in. This network also has the same name and configuration on all hosts with strict static IP addresses. This way, all the VMs should see each other no matter on what host they are running.

Did anyone notice the migration?

I wanted to see two things:

How many pings do we lose while migrating?
What does the VM see while migrating?

Well, it turned out pretty good. Pinging the machine from a third computer (migration started at the 15th and ended at the 34th package):

> ping xx.xx.xx.xx
PING xx.xx.xx.xx (xx.xx.xx.xx) 56(84) bytes of data.
64 bytes from xx.xx.xx.xx: icmp_seq=1 ttl=64 time=0.618 ms
64 bytes from xx.xx.xx.xx: icmp_seq=2 ttl=64 time=0.620 ms
64 bytes from xx.xx.xx.xx: icmp_seq=3 ttl=64 time=0.589 ms
64 bytes from xx.xx.xx.xx: icmp_seq=4 ttl=64 time=0.502 ms
64 bytes from xx.xx.xx.xx: icmp_seq=5 ttl=64 time=0.500 ms
64 bytes from xx.xx.xx.xx: icmp_seq=6 ttl=64 time=0.628 ms
64 bytes from xx.xx.xx.xx: icmp_seq=7 ttl=64 time=0.662 ms
64 bytes from xx.xx.xx.xx: icmp_seq=8 ttl=64 time=0.664 ms
64 bytes from xx.xx.xx.xx: icmp_seq=9 ttl=64 time=0.374 ms
64 bytes from xx.xx.xx.xx: icmp_seq=10 ttl=64 time=0.596 ms
64 bytes from xx.xx.xx.xx: icmp_seq=11 ttl=64 time=0.540 ms
64 bytes from xx.xx.xx.xx: icmp_seq=12 ttl=64 time=0.553 ms
64 bytes from xx.xx.xx.xx: icmp_seq=13 ttl=64 time=0.543 ms
64 bytes from xx.xx.xx.xx: icmp_seq=14 ttl=64 time=0.542 ms
64 bytes from xx.xx.xx.xx: icmp_seq=15 ttl=64 time=0.474 ms
64 bytes from xx.xx.xx.xx: icmp_seq=16 ttl=64 time=0.576 ms
64 bytes from xx.xx.xx.xx: icmp_seq=17 ttl=64 time=0.494 ms
64 bytes from xx.xx.xx.xx: icmp_seq=18 ttl=64 time=0.600 ms
64 bytes from xx.xx.xx.xx: icmp_seq=19 ttl=64 time=0.653 ms
64 bytes from xx.xx.xx.xx: icmp_seq=20 ttl=64 time=0.610 ms
64 bytes from xx.xx.xx.xx: icmp_seq=21 ttl=64 time=0.349 ms
64 bytes from xx.xx.xx.xx: icmp_seq=22 ttl=64 time=0.618 ms
64 bytes from xx.xx.xx.xx: icmp_seq=23 ttl=64 time=0.536 ms
64 bytes from xx.xx.xx.xx: icmp_seq=24 ttl=64 time=0.548 ms
64 bytes from xx.xx.xx.xx: icmp_seq=25 ttl=64 time=0.497 ms
64 bytes from xx.xx.xx.xx: icmp_seq=26 ttl=64 time=0.617 ms
64 bytes from xx.xx.xx.xx: icmp_seq=27 ttl=64 time=0.640 ms
64 bytes from xx.xx.xx.xx: icmp_seq=28 ttl=64 time=0.471 ms
64 bytes from xx.xx.xx.xx: icmp_seq=29 ttl=64 time=0.539 ms
64 bytes from xx.xx.xx.xx: icmp_seq=30 ttl=64 time=0.507 ms
64 bytes from xx.xx.xx.xx: icmp_seq=31 ttl=64 time=0.529 ms
64 bytes from xx.xx.xx.xx: icmp_seq=32 ttl=64 time=0.709 ms
64 bytes from xx.xx.xx.xx: icmp_seq=33 ttl=64 time=0.609 ms
64 bytes from xx.xx.xx.xx: icmp_seq=34 ttl=64 time=0.729 ms
64 bytes from xx.xx.xx.xx: icmp_seq=35 ttl=64 time=0.697 ms
64 bytes from xx.xx.xx.xx: icmp_seq=36 ttl=64 time=0.658 ms
64 bytes from xx.xx.xx.xx: icmp_seq=37 ttl=64 time=0.652 ms
64 bytes from xx.xx.xx.xx: icmp_seq=38 ttl=64 time=0.720 ms
64 bytes from xx.xx.xx.xx: icmp_seq=39 ttl=64 time=0.574 ms
64 bytes from xx.xx.xx.xx: icmp_seq=40 ttl=64 time=0.681 ms
64 bytes from xx.xx.xx.xx: icmp_seq=41 ttl=64 time=0.482 ms
64 bytes from xx.xx.xx.xx: icmp_seq=42 ttl=64 time=0.722 ms
64 bytes from xx.xx.xx.xx: icmp_seq=43 ttl=64 time=0.642 ms
64 bytes from xx.xx.xx.xx: icmp_seq=44 ttl=64 time=0.701 ms
64 bytes from xx.xx.xx.xx: icmp_seq=45 ttl=64 time=0.751 ms
64 bytes from xx.xx.xx.xx: icmp_seq=46 ttl=64 time=0.624 ms
64 bytes from xx.xx.xx.xx: icmp_seq=47 ttl=64 time=0.634 ms
64 bytes from xx.xx.xx.xx: icmp_seq=48 ttl=64 time=0.602 ms
64 bytes from xx.xx.xx.xx: icmp_seq=49 ttl=64 time=0.669 ms
64 bytes from xx.xx.xx.xx: icmp_seq=50 ttl=64 time=0.634 ms
64 bytes from xx.xx.xx.xx: icmp_seq=51 ttl=64 time=0.644 ms
64 bytes from xx.xx.xx.xx: icmp_seq=52 ttl=64 time=0.683 ms
64 bytes from xx.xx.xx.xx: icmp_seq=53 ttl=64 time=0.734 ms
64 bytes from xx.xx.xx.xx: icmp_seq=54 ttl=64 time=0.623 ms
^C
--- xx.xx.xx.xx ping statistics ---
54 packets transmitted, 54 received, 0% packet loss, time 53002ms
rtt min/avg/max/mdev = 0.349/0.599/0.751/0.088 ms

Running a little script on the machine being migrated (migration started at the 44th and finished at the 2nd second):

migratetest2:~ # while true; do date -Iseconds; sleep 1; done | tee migrate.log
2013-09-13T22:48:27+0200
2013-09-13T22:48:28+0200
2013-09-13T22:48:29+0200
2013-09-13T22:48:30+0200
2013-09-13T22:48:31+0200
2013-09-13T22:48:32+0200
2013-09-13T22:48:33+0200
2013-09-13T22:48:34+0200
2013-09-13T22:48:35+0200
2013-09-13T22:48:36+0200
2013-09-13T22:48:37+0200
2013-09-13T22:48:38+0200
2013-09-13T22:48:39+0200
2013-09-13T22:48:40+0200
2013-09-13T22:48:41+0200
2013-09-13T22:48:42+0200
2013-09-13T22:48:43+0200
2013-09-13T22:48:44+0200
2013-09-13T22:48:45+0200
2013-09-13T22:48:46+0200
2013-09-13T22:48:47+0200
2013-09-13T22:48:48+0200
2013-09-13T22:48:49+0200
2013-09-13T22:48:50+0200
2013-09-13T22:48:51+0200
2013-09-13T22:48:52+0200
2013-09-13T22:48:53+0200
2013-09-13T22:48:54+0200
2013-09-13T22:48:55+0200
2013-09-13T22:48:56+0200
2013-09-13T22:48:57+0200
2013-09-13T22:48:58+0200
2013-09-13T22:48:59+0200
2013-09-13T22:49:00+0200
2013-09-13T22:49:01+0200
2013-09-13T22:49:02+0200
2013-09-13T22:49:03+0200
2013-09-13T22:49:04+0200
2013-09-13T22:49:05+0200
2013-09-13T22:49:06+0200
2013-09-13T22:49:07+0200
2013-09-13T22:49:08+0200
2013-09-13T22:49:09+0200
2013-09-13T22:49:10+0200
2013-09-13T22:49:11+0200
^C
migratetest2:~ #

Please note, that this is an “easy job” as the infrastructure is pretty fast and the test virtual machine was not doing any work at all (except running the above script). The hosts have professional SSDs in hardware RAID 0, direct Gigabit ethernet connection and not much load for the blazing fast processors.

Additional random error messages you may encounter

The receiving host can not satisfy your CPU configuration needs

Although the two VM hosts has the exact same hardware configuration, I got and error that stated that the receiving host did not have the capability to provide the exact same CPU configuration to the VM.

I could not reproduce the problem, but I did solve it when it came up. I just had to replace the exotic CPU configuration the VM had:

<cpu mode='custom' match='exact'>
  <model fallback='allow'>SandyBridge</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='est'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='osxsave'/>
  <feature policy='require' name='smx'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='vme'/>
  <feature policy='require' name='dtes64'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='dca'/>
  <feature policy='require' name='pcid'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='pdpe1gb'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='monitor'/>
</cpu>

with plain kvm:

<cpu match='exact'>
  <model fallback='allow'>kvm64</model>
  <topology sockets='1' cores='1' threads='1'/>
</cpu>

or an even more basic setup:

<cpu>
  <topology sockets='1' cores='1' threads='1'/>
</cpu>

End of file from monitor and not processing incoming migration

These errors came up randomly, and I could not figure out anything about them. They both disappeared after I decided to set up the receiving VM host completely (networks, kvm kernel module, etc.) before playing around with migration like a child who can not wait till dessert.

Good luck with migration! :)