This is a description of how to set up diskless cluster nodes for a small scale beowulf cluster using Debian GNU/Linux.
Although this was done in the context of using the nodes for a HPC cluster, the technique described here should work for diskless nodes for any purpose where they may be useful (although things like setting up x11 for the nodes, which would be required for diskless workstations, is not described).
There is essentially only one hardware requirement for a diskless system: a PXE netboot capable network interface. PXE is the defacto standard protocal for handling netboot. All though they aren't ubiquitous, most modern nics support PXE, and all modern motherboards that I've come across will recognize the network adapter as a valid boot device.
Three services are required for a diskless system:
These three services are independent, so there's no reas on that they need to be on the same machine, but there's really no reason not to put them on the same machine, and that's what's done in the system described herein. Since all of these services need to information about the nodes that are going to be serviced, it makes keeping track of things much easier if these services are all on the same machine anyway.
debootstrap --verbose --resolve-deps $RELEASE $NFSROOTDIR $MIRROR where $RELEASE="sarge" $MIRROR="http://debian.csail.mit.edu/debian-amd64/debian/"
mkinitrd. It runs scripts in /etc/mkinitrd/scripts.
install lessdisks dhcp3-server atftpd syslinux cp /usr/share/doc/lessdisks-doc/pxe/default /var/lib/lessdisks/boot/pxelinux.cfg/ setup atftpd to serve $TFTPBOOT ln -s /usr/lib/syslinux/pxelinux.0 $TFTPBOOT/ create $TFTPBOOT/pxelinux.cfg/default optionally, configure serial console during boot (modify pxelinux.cfg/default): first line: serial 0 115200 0x303 add to initrd line: console=tty0 console=ttyS0,115200n8 configure dhcp3-server to point filename to /$TFTPBOOT/pxelinux.0 once lessdisks is configured: ln -s $LESSDISKS/boot/vmlinuz $TFTPBOOT/ ln -s $LESSDISKS/boot/initrd.img $TFTPBOOT/
/sbin/ifconfig /sbin/dhclient /sbin/routeThis should also obviously include their dependencies.
/etc/dhclient-script $CONFDIR/config-file
# network card modules
nic_modules="via-velocity"
# dependencies of network card modules
nic_dep_modules="crc_ccitt"
# nfs and related dependencies
net_modules="nfs af_packet sunrpc unix lockd"
/etc/hosts
The nodes will require some host-specific files:
should probably have some host-specific file that includes hostname/ip/mac/etc. /etc/hostname
Each node will also require their own specific nfs mount for /var. To
To use, run update-cluster-regenerate to update various config files (using
/usr/lib/update-cluster/[configuration].updatelist).
update-cluster-regenerate all
Way to issue commands to multiple nodes. (updated using update-cluster).
admin@zajos:~$ cd /srv/cluster/nodes/kernel tar xjf /usr/src/[kernel-source].tar.bz2 cd [kernel-source] make-kpkg --rootcmd fakeroot --config menuconfig --append-to-version +abit-kv8-cluster --initrd --revision 0.1 kernel_image admin@zajos:/srv/cluster/nodes/kernel#
Installed and configured package "lessdisks". This places a diskless root in /var/lib/lessdisks.
Creates the basic Debian system root for the nodes using debootstap:
debootstrap --verbose --resolve-deps $RELEASE $DISKLESSROOT $MIRROR where $RELEASE="sarge" $MIRROR="http://debian.csail.mit.edu/debian-amd64/debian/"
lessdisks-install (configures lessdisks and installs diskless root in /var/lib/lessdisks/)
debug: $LDROOT/etc/mkinitrd/scripts/netboot sources $LDROOT/etc/lessdisks/mkinitrd/initrd-netboot.conf script_dir="$CONFDIR/install_scripts/" scripts in $INITRD/scripts/ get run by.... ************************** initrd needs /var/lib/dhcp for leases! ************************** added to $LDROOT/etc/lessdisks/mkinitrd/install_scripts/ 60_mount_var #!/bin/sh # give dhclient a place for its leases echo " Creating tmpfs /var..." mount -nt tmpfs tmpfs /var || mount -nt ramfs ramfs /var mkdir -p /var/lib/dhcp chown root:lessdisks $LDROOT/etc/lessdisks/mkinitrd/install_scripts/60_mount_var chmod 755 $LDROOT/etc/lessdisks/mkinitrd/install_scripts/60_mount_var modified /etc/inittab in nfs root: setup serial console in initrd single virtual console TODO: what about "diskless"? add "condor" client packages
during mkinitrd:
---------------
run /etc/mkinitrd/scripts/netboot
sources /etc/lessdisks/mkinitrd/initrd-netboot.conf (variables)
cp all /etc/lessdisks/mkinitrd/install_scripts/ to $INITRDDIR/scripts/
bring in extra executables specified in initrd-netboot.conf (initrd_exe)
/usr/bin/cut
/sbin/ifconfig
/bin/grep
/usr/bin/head
/bin/hostname
/usr/bin/tr
/sbin/route
/sbin/udhcpc
bring in extra files specified in initrd-netboot.conf (initrd_files)
/dev/ttyS0
/dev/console
/etc/lessdisks/mkinitrd/network_script
/etc/lessdisks/mkinitrd/initrd-netboot.conf
bring in executable dependencies (ldd $initrd_exe | sort -u | awk '{print $3}')
copy in modules:
***************************
if [ -z "$MODULEDIR" ]; then
echo "MODULEDIR not defined, exiting..."
exit 1
fi
if [ -r "$MODULEDIR/modules.dep" ]; then
modules_dep_file="$MODULEDIR/modules.dep"
fi
if [ -z "$modules_dep_file" ]; then
modules_dep_file="/lib/modules/$(uname -r)/modules.dep"
fi
# TODO use "modprobe -nv $module" instead of module dependencies in variables
for m in $nic_modules $nic_dep_modules $net_modules ; do
module=$(egrep "/$m.o:|/$m.ko:" $modules_dep_file | cut -d : -f 1)
if [ -n "$module" ]; then
install_modules="$install_modules $module"
fi
done
# copy found modules into initrd dir
for m in $install_modules ; do
if [ -r "$m" ]; then
mdir="$(dirname $m)"
mkdir -p "$INITRDDIR/$mdir"
cp -a $m $INITRDDIR/$m
fi
done
***************************
add /etc/init.d/lessdisk /etc/init.d/lessdisk-session to nfsroot
loading initrd /scripts:
---------------