AoE root for KVM guests


Intro

So. I’m trying to get familiar with libvirt and friends. To this end, I’ve set up a Lucid virtual machine booting from PXE into an initrd environment which does a pivot_root to an AoE block device.

The #virt channel on irc.oftc.net told me that in order to have libvirt provide PXE capability, I would have to install a recent version of libvirt. I built version 0.7.5-3 from sid on my karmic laptop and it seems to be working okay.

I decided to set up the pxe root directoy in /var/lib/tftproot just because that’s what the example code had in it.

Configure the Virtual Network

I had to manually configure a virtual network. Here is the XML config file:

$ sudo virsh net-dumpxml netboot
<network>
  <name>netboot</name>
  <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid>
  <forward mode='nat'/>
  <bridge name='virbr1' stp='off' delay='1' />
  <domain name='example.com'/>
  <ip address='192.168.123.1' netmask='255.255.255.0'>
    <tftp root='/var/lib/tftproot' />
    <dhcp>
      <range start='192.168.123.2' end='192.168.123.254' />
      <bootp file='pxelinux.0' />
    </dhcp>
  </ip>
</network>

Install syslinux

This, of course, depends on the pxelinux.0 file. Luckily, this is packaged up in syslinux and can be installed with a simple


$ sudo apt-get install syslinux
$ sudo mkdir /var/lib/tftproot
$ sudo cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftproot

Configure PXE boot parameters

I had to create a pxelinux config file for the virtual machine (indexed by mac address). Note that I put a console=ttyS0,115200 argument on the kernel command line so that I can attach to the serial port from the host system for copy/paste debugging. Also of importance is the root=/dev/etherd/e0.1p1 argument, specifying which block device we’ll be doing the pivot_root to eventually.


$ mkdir /var/lib/tftproot/pxelinux.cfg/
$ cat /var/lib/tftproot/pxelinux.cfg/01-52-54-00-44-34-67
DEFAULT linux
LABEL linux
SAY Now booting the kernel from PXELINUX...
KERNEL vmlinuz-lucid0
APPEND ro root=/dev/etherd/e0.1p1 console=ttyS0,115200 initrd=initrd.img-lucid0

I decided to use the karmic kernel for lucid initially. I’ll eventually switch over to the lucid kernel ;)


$ sudo cp /boot/vmlinuz-2.6.31-17-generic /var/lib/tftproot/vmlinuz-lucid0

Customize initramfs-tools

I copied /etc/initramfs-tools to ~/tmp/lucid so that I didn’t mess up the system initrd scripts:


$ mkdir -p ~/tmp/lucid && cp -r /etc/initramfs-tools ~/tmp/lucid/

Since mkinitramfs doesn’t currently have a system for AoE root, I had to do a bit of fiddling. I copied the NFS root boot script and made a couple of modifications.

$ diff -u /usr/share/initramfs-tools/scripts/nfs ~/tmp/lucid/initramfs-tools/scripts/aoe
--- /usr/share/initramfs-tools/scripts/nfs	2008-06-23 23:10:21.000000000 -0700
+++ /home/cjac/tmp/lucid/initramfs-tools/scripts/aoe	2010-01-15 14:56:28.098298027 -0800
@@ -5,59 +5,25 @@
 retry_nr=0
 
 # parse nfs bootargs and mount nfs 
-do_nfsmount()
+do_aoemount()
 {
-
 	configure_networking
 
-	# get nfs root from dhcp
-	if [ "x${NFSROOT}" = "xauto" ]; then
-		# check if server ip is part of dhcp root-path
-		if [ "${ROOTPATH#*:}" = "${ROOTPATH}" ]; then
-			NFSROOT=${ROOTSERVER}:${ROOTPATH}
-		else
-			NFSROOT=${ROOTPATH}
-		fi
-
-	# nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
-	elif [ -n "${NFSROOT}" ]; then
-		# nfs options are an optional arg
-		if [ "${NFSROOT#*,}" != "${NFSROOT}" ]; then
-			NFSOPTS="-o ${NFSROOT#*,}"
-		fi
-		NFSROOT=${NFSROOT%%,*}
-		if [ "${NFSROOT#*:}" = "$NFSROOT" ]; then
-			NFSROOT=${ROOTSERVER}:${NFSROOT}
-		fi
-	fi
+        ip link set up dev eth0
 
-	if [ -z "${NFSOPTS}" ]; then
-		NFSOPTS="-o retrans=10"
-	fi
+        ls /dev/etherd/
 
-	[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/nfs-premount"
-	run_scripts /scripts/nfs-premount
-	[ "$quiet" != "y" ] && log_end_msg
+        echo > /dev/etherd/discover
 
-	if [ ${readonly} = y ]; then
-		roflag="-o ro"
-	else
-		roflag="-o rw"
-	fi
+        ls /dev/etherd/
 
-	nfsmount -o nolock ${roflag} ${NFSOPTS} ${NFSROOT} ${rootmnt}
+        mount ${ROOT} ${rootmnt}
 }
 
-# NFS root mounting
+# AoE root mounting
 mountroot()
 {
-	[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/nfs-top"
-	run_scripts /scripts/nfs-top
-	[ "$quiet" != "y" ] && log_end_msg
-
-	modprobe nfs
-	# For DHCP
-	modprobe af_packet
+	modprobe aoe
 
 	# Default delay is around 180s
 	# FIXME: add usplash_write info
@@ -67,17 +33,13 @@
 		delay=${ROOTDELAY}
 	fi
 
-	# loop until nfsmount succeds
+	# loop until aoemount succeds
 	while [ ${retry_nr} -lt ${delay} ] && [ ! -e ${rootmnt}${init} ]; do
 		[ ${retry_nr} -gt 0 ] && \
-		[ "$quiet" != "y" ] && log_begin_msg "Retrying nfs mount"
-		do_nfsmount
+		[ "$quiet" != "y" ] && log_begin_msg "Retrying AoE mount"
+		do_aoemount
 		retry_nr=$(( ${retry_nr} + 1 ))
 		[ ! -e ${rootmnt}${init} ] && /bin/sleep 1
 		[ ${retry_nr} -gt 0 ] && [ "$quiet" != "y" ] && log_end_msg
 	done
-
-	[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/nfs-bottom"
-	run_scripts /scripts/nfs-bottom
-	[ "$quiet" != "y" ] && log_end_msg
 }

(below is the full file in case udiff is less convenient)

$ cat ~/tmp/lucid/initramfs-tools/scripts/aoe
# NFS filesystem mounting			-*- shell-script -*-

# FIXME This needs error checking

retry_nr=0

# parse nfs bootargs and mount nfs 
do_aoemount()
{
	configure_networking

        ip link set up dev eth0

        ls /dev/etherd/

        echo > /dev/etherd/discover

        ls /dev/etherd/

        mount ${ROOT} ${rootmnt}
}

# AoE root mounting
mountroot()
{
	modprobe aoe

	# Default delay is around 180s
	# FIXME: add usplash_write info
	if [ -z "${ROOTDELAY}" ]; then
		delay=180
	else
		delay=${ROOTDELAY}
	fi

	# loop until aoemount succeds
	while [ ${retry_nr} -lt ${delay} ] && [ ! -e ${rootmnt}${init} ]; do
		[ ${retry_nr} -gt 0 ] && \
		[ "$quiet" != "y" ] && log_begin_msg "Retrying AoE mount"
		do_aoemount
		retry_nr=$(( ${retry_nr} + 1 ))
		[ ! -e ${rootmnt}${init} ] && /bin/sleep 1
		[ ${retry_nr} -gt 0 ] && [ "$quiet" != "y" ] && log_end_msg
	done
}

There was also a small modification to the initramfs.conf file:

$ diff -u /etc/initramfs-tools/initramfs.conf ~/tmp/lucid/initramfs-tools/initramfs.conf
--- /etc/initramfs-tools/initramfs.conf	2008-07-08 18:37:42.000000000 -0700
+++ /home/cjac/tmp/lucid/initramfs-tools/initramfs.conf	2010-01-15 14:33:38.088295207 -0800
@@ -47,14 +47,16 @@
 #
 
 #
-# BOOT: [ local | nfs ]
+# BOOT: [ local | nfs | aoe]
 #
 # local - Boot off of local media (harddrive, USB stick).
 #
 # nfs - Boot using an NFS drive as the root of the drive.
 #
+# aoe - Boot using an AoE drive as the root of the drive.
+#
 
-BOOT=local
+BOOT=aoe
 
 #
 # DEVICE: ...

I also needed to add aoe to the list of modules included in the initramfs:


$ echo aoe >> ~/tmp/lucid/initramfs-tools/modules

In order to generate the initrd.img file from this new config, I ran the following:


$ sudo mkinitramfs -d ~/tmp/lucid/initramfs-tools/ -o /var/lib/tftproot/initrd.img-lucid0

Install OS to virtual block device

I created a lucid VM by installing from the desktop install disk. You can grab the ISO here:

http://cdimage.ubuntu.com/daily-live/current/

I’ll leave the creation of the virtual machine and installation as an exercise for the reader. I put the filesystem on an lvm volume group called vg0 in a logical volume called lucid0 (ie, /dev/vg0/lucid0).

Create virtual machine definition with virsh

At this point, I created a new virtual machine called lucid0. Here is the xml for the domain:

$ sudo virsh dumpxml lucid0
<domain type='kvm' id='1'>
  <name>lucid0</name>
  <uuid>96fbad21-4f25-5700-ddd8-1a565c7170ee</uuid>
  <memory>524288</memory>
  <currentMemory>524288</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-0.11'>hvm</type>
    <boot dev='network'/>
  </os>
  <features>
    <pae/>
  </features>
  <clock offset='localtime'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <interface type='network'>
      <mac address='52:54:00:44:34:67'/>
      <source network='netboot'/>
      <target dev='vnet0'/>
    </interface>
    <serial type='pty'>
   <source path='/dev/pts/4'/>
      <target port='0'/>
    </serial>
  <console type='pty' tty='/dev/pts/4'>
   <source path='/dev/pts/4'/>
      <target port='0'/>
    </console>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5901' autoport='yes' listen='127.0.0.1' keymap='en-us'/>
    <sound model='es1370'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
    </video>
  </devices>
</domain>

Start AoE target

Now we’re ready to start the AoE target and launch the virtual machine. If you don’t have vblade installed, do so now:


$ sudo apt-get install vblade

Start the target up with the following command:


$ sudo vbladed 0 1 virbr1 /dev/vg0/lucid0

Boot the virtual machine

Now, if all goes well, you should be able to watch the virtual machine boot up and do its thing like so:


$ sudo virsh start lucid0 && sudo screen -S lucid0 `sudo virsh ttyconsole lucid0` 115200

If you get errors about /dev/etherd/e0.1p1 not existing (these might look like this):

Begin: Retrying AoE mount ...
err         discover    interfaces  revalidate  flush
err         discover    interfaces  revalidate  flush
mount: mounting /dev/etherd/e0.1p1 on /root failed: No such file or directory
Done.

then you might want to try restarting vbladed like this:


$ sudo kill -9 `ps auwx | grep vblade | grep -v grep | awk '{print $2}' ` && sudo vbladed 0 1 virbr1 /dev/vg0/lucid0

Questions? Comments?

So. Now you should have a lucid gdm in your virt-manager console. Any questions? #virt on irc.oftc.net

Also, feel free to email me


One response to “AoE root for KVM guests”

  1. (08:20:10) cj: MikeN: hey, look who’s the second result for ‘aoe root’ in google search results
    (08:53:02) MikeN: aww
    (08:53:13) MikeN: we’re like AoE root filesystem BFFs
    (09:11:13) MikeN: jlott is #4 there
    (09:38:21) cj: we could be like super heroes and save the universe with our aoe root awesomeness.
    (09:38:40) MikeN: yep
    (09:38:43) MikeN: it would rule

Leave a Reply