I’m currently in the process of modernizing my home lab environment. I decided to go with Terraform, and the first step was getting to where I can spin up a new host in my vsphere cluster just by adding a few lines in terraform.

This should be simple enough, right?

The Goal

In order to make things easy, I have the following requirements for my lab environment:

  • Base all VMs off of a cloned image for reproducibility.
  • No manual DNS management; terraform should do that.
  • No manual IP management; the DHCP server should do that.

There are other requirements in other areas of my overall environment; for example, in some VLANs I don’t have dynamic DHCP and everything is static (but still assigned by the DHCP server via fixed mappings). That actually makes things a bit more complex, and is out of scope for this particular adventure.

For the lab, this is sufficient to make my life very easy.

Sussing It Out

My lab environment is fairly simple. It runs on a vmware cluster, and exists in its own VLAN. In my case, this has already been set up; I have a lab, it’s just not modern. It even has a dynamic DHCP address pool available (though currently unused). DNS is served by ISC BIND 9, and the lab lives in its own subdomain.

With that much already done, reaching the goal can be accomplished with a handful of smaller tasks:

  • Enable dynamic DNS on the nameserver.
  • Create an image to clone from.
    • Make it as small as reasonable on disk.
    • Ensure that it’s prepped to boot as a new system without sticky network interfaces and other fun things from a previous boot.
    • Teach it to automagically grow the root partition into whatever space is available when the new VM is allocated.
  • Teach terraform to create the VM
  • Teach terraform to automatically configure DNS entries

Note that my environment here is entirely on Scientific Linux 7, so that’s what this is geared to. If you try to do this on a different distro, you might find a few differences (especially where config files live, and in how to create golden images).

Prerequisite: Enable Dynamic DNS

If I’m going to have terraform manage the DNS, then the DNS server needs to allow for it. BIND 9 supports dynamic DNS natively, so this isn’t terribly difficult. In my case, it involved adding access credentials for terraform and a few lines of configuration to enable dynamic updates to the lab zone.

The access key is simple enough to set up. First you create a key for terraform to use (crux is my primary nameserver):

 GTU:S  steve@crux:/tmp$ dnssec-keygen -a hmac-md5 -b 128 -n USER terraform
Kterraform.+157+62058
 GTU:S  steve@crux:/tmp$ cat Kterraform.+157+62058.key
terraform. IN KEY 0 3 157 S9tpIsXK/7w2q0SbJ2DRoQ==`

The dnssec-keygen command generates the key and prints the basename of the two output files it creates on standard out. The .key file ( Kterraform.+157+62058.key in this example) contains the value we’re looking for, stored in BIND zone format. For this purpose, we only need the content of the last field on that line ( S9tpIsXK/7w2q0SbJ2DRoQ== in this example).

Next, we add that key to /etc/named.conf like so:

key terraform {
  algorithm hmac-md5;
  secret "S9tpIsXK/7w2q0SbJ2DRoQ==";
};

Finally, we need to alter the target zone(s) to allow updates from terraform. In my case, I simply added an allow-update directive to the zones in question:

zone "lab.gtu.floating.io" {
     type master;
     file "zone.io.floating.gtu.lab.db";
     allow-update { key terraform; };
};

zone "0.24.10.in-addr.arpa" {
     type master;
     file "zone.arpa.in-addr.10.24.0.db";
     allow-update { key terraform; };
};

Then restart the nameserver and verify that it came back up correctly. Once it’s up, you can test with the nsupdate utility:

 GTU:S  steve@crux:/tmp$ nsupdate
> server crux.s.gtu.floating.io
> key terraform S9tpIsXK/7w2q0SbJ2DRoQ==
> zone lab.gtu.floating.io
> add foohost.lab.gtu.floating.io 3600 IN A 1.2.3.4
> send

In another window, you can pause here to test this with host or nslookup. You should see results similar to this:

prometheus:~ steve$ host foohost.lab.gtu.floating.io
foohost.lab.gtu.floating.io has address 1.2.3.4

When you’re done testing, you can go back to the first window and remove the test entry:

> del foohost.lab.gtu.floating.io
> send
> quit

And that’s it. You’ve enabled dynamic updates for BIND. It will now keep a journal file next to the zone (with the .jnl extension) containing updates, which it will eventually fold back into the main zone file on its own schedule.

A couple of important notes about using dynamic DNS with BIND:

  • Once dynamic DNS is enabled, you cannot simply view or edit the zone file.
    • If you want to view it, you should first run rndc sync; this will force all updates to be written out to the zone file.
    • If you want to edit the file, you must first freeze the zone with rndc freeze <zone>. Note that this also implies an rndc sync. When you’re done editing, you can rndc thaw <zone>, and BIND will reload the zone file. Dynamic updates will not be processed between the freeze and the thaw.
    • Unless specified in the BIND documentation, doing this any other way will likely result in wonky behavior.
  • Dynamic updates will automatically increment your zone’s serial number.
  • While BIND will happily import pre-existing zone data, it will ignore any comments or other manual formatting that were present in the initial zone file, and the first update will cause them to vanish. This is very annoying; I like neatly formatted and commented zone files, and BIND doesn’t even try to preserve them. The only way around this is to dedicate an entire zone to dynamic updates, which won’t work out well in my case.

Prerequisite: Configure Terraform’s DNS Provider

Now that you have dynamic DNS enabled, you need to teach terraform how to talk to it. This is fairly simple to do; terraform has a built-in provider for it, so you just need to add the configuration to one of your tf files in your terraform source:

provider "dns" {
  update {
    server        = "10.0.10.5"
    key_name      = "terraform."
    key_algorithm = "hmac-md5"
    key_secret    = "S9tpIsXK/7w2q0SbJ2DRoQ=="
  }
}

Change the IP address to point at your nameserver, of course. You should also note the extra . character at the end of the key name. This is important; it won’t work without it. As for the key, it should be the same one you created when setting up dynamic DNS above.

Once done, terraform’s DNS provider should happily talk to your BIND server.

Prerequisite: Set Up Terraform’s vSphere Provider

Terraform includes a provider specifically for vmware vSphere. Note that it won’t work right unless you have the commercial version (and vCenter may be required). Users of free ESXi instances need not apply, unfortunately.

Luckily, I already have licenses and a vCenter instance. If you don’t, check out VMUG. It’s about $200/year last I checked, and gives you licenses for most of vmware’s stack for personal use. It’s a very good deal.

Once you’ve got vmware all ready to go (which is beyond the scope of this article), you can go ahead and configure Terraform to talk to it. This is fairly straightforward; simply define the provider in one of your tf files:

provider "vsphere" {
  vsphere_server       = "${var.vsphere_server}"
  user                 = "${var.vsphere_user}"
  password             = "${var.vsphere_password}"
  allow_unverified_ssl = true
}

You can then add the relevant variables to your terraform.tfvars file:

#
# vSphere Credentials
#
vsphere_server   = "vcenter.s.gtu.floating.io"
vsphere_user     = "vsphere.local\\terraform"
vsphere_password = "password"

See the provider documentation for more information on other options. Also note that your chosen account must have sufficient privileges to do what you want it to do.

And that’s pretty much it; you’re ready to go.

The Golden Image

There are any number ways to build images for the cloud or various hypervisors; one of the most popular seems to be HashiCorp’s Packer utility. In my case, however, I’ve historically built machines with Kickstart, and I see no particular reason to change. For my purposes, I’m going to build my image from a kickstart file; YMMV.

Kickstart is already available via PXE in my environment, and I’m not going to detail that. Bottom line: if a vmware VM doesn’t have a valid boot partition, it will PXE boot from the network. My environment is set up to automatically deliver the Scientific Linux installer, along with parameters that specify where to get the kickstart file. How to do this is a topic for another day; see the RedHat documentation on Kickstart for more information.

If you have that set up, though, it’s fairly trivial to build an image to clone from.

There is a slight digression, however. My existing kickstart isn’t going to work out of the box because we’re going to need a few extra toys to make this work well. In my case, I’ve made the following changes to my standard kickstart file to set it up for terraforming:

  • Switched out the reboot directive with poweroff instead. This causes the VM to shut down immediately after the build finishes.
  • Added the jq, cloud-init, cloud-init-vmware-guestinfo, and cloud-init-growpart packages to the install list. The first one is just to make things easier on me when scripting host setup, and the last three are for cloud-init (duh!), which makes a few things easier if we have it around. Don’t forget to add the requisite yum repos also!
  • Added a post-install snippet to clean the image in preparation for booting as a brand new machine after cloning.
  • Added a post-install snippet to configure cloud-init to grow our root partition and filesystem automatically for us so that it expands to fill up the disk.

On the latter two points, my %post section looks something like this:

%post
# Remove persistent network device settings so things don't break
# in vmware templates (sigh)
sed -i -e '/HWADDR/d' -e '/UUID/d' /etc/sysconfig/network-scripts/ifcfg-eth0
rm -f /etc/udev/rules.d/70-persistent-net.rules
rm -f /etc/hostname /etc/ssh_host_*
echo -n > /etc/machine-id

#
# Cloud configuration
#
cat >>/etc/cloud/cloud.cfg <<CLOUD
bootcmd:
 - growpart $(/usr/sbin/pvs --reportformat json | jq -r '.["report"][0]["pv"][0]["pv_name"]' | sed -e "s/\([0-9]]*\)$/ \1/") -u auto
 - /usr/sbin/pvresize $(/usr/sbin/pvs --reportformat json | jq -r '.["report"][0]["pv"][0]["pv_name"]')
 - /usr/sbin/lvextend -r -l +100%FREE system/root
CLOUD

%end

The first part is easy enough; it just erases all the instance-specific data that I’m aware of so the newly-cloned machine will boot as if for the first time. This process is explained in RedHat’s documentation. The big thing is not getting your NIC locked to the MAC address it had when the image was created. That way lay much badness and hair pulling.

Note that I use the old-style device naming; this is set earlier in my config. You could probably just run that sed on /etc/sysconfig/ifcfg-*, which might be more appropriate anyway.

The second part is a little more interesting. We use cloud-init’s bootcmd facility to execute a sequence of commands that grow the root partition to fill the disk, extend the physical LVM device that resides there to use all the new space, and then resize the root filesystem that should live on that device. Seems simple enough.

There is probably a great deal of room for improvement, but that’s for another day – and don’t forget to adjust the device names. My kickstart sets different defaults for the volume group and logical volume name for the root filesystem. YMMV.

Once this is done, you create a new VM in vSphere, giving it the name of your soon-to-be-new image, and let it boot up. When the process is complete it will shut back down again, and that virtual machine is now a pristine image that you can clone from. You can also mark it as a template in vmware, will which prevent anyone from turning it on as a normal virtual machine; this is highly recommended just to avoid future problems.

A few notes on the VM you create to become your template:

  • The anaconda installer requires ~2GB of RAM. Set it to at least 2GB, even if you’re going to have cloned machines with less. You can set the cloned host’s RAM size in terraform; it isn’t tied to the image’s RAM size setting.
  • The disk can be as small or as large as you want. I keep a minimum 2GB swap partition on my machines, so I need a minimum of 6GB to complete the install. Make it as small as you can get away with, especially if your vmware environment doesn’t have VAAI support on remote datastores. The cloning times will be much better for it. Like RAM, you can set a different disk size on the machines you clone; they must be at least this large though.
  • It doesn’t matter what network your image VM was built on; if you have multiple networks in vmware, you can set a different one in terraform when you build the cloned host.

The only major improvement I want to pursue so far is to eliminate that 2GB swap partition in the image. Since we’re already mucking about with the disk during cloud-init’s boot, it should be fairly trivial to move it to cloud-init. That would kill 30% of the minimum required disk size, which would be a good thing.

Building New VMs With Terraform

With all of that set up, you’re now ready to write a terraform manifest to clone up a new machine. If you’re familiar with terraform, this part is pretty straightforward. There are only a few things you need to do to make it work cleanly with what we’ve built here.

First, you need a number of resources that you’ll want to reference when you create your virtual machine. A lot of this can be done directly without other objects in the mix, but this makes it more future-proof. It’s a bit of extra typing now for a much easier transition to the next provider version later. In theory, at least.

The prerequisite items you’ll need might look something like this:

data "vsphere_datacenter" "GTU" {
  name = "GTU"
}

data "vsphere_compute_cluster" "cluster" {
  name          = "Lab"
  datacenter_id = "${data.vsphere_datacenter.gtu.id}"
}

data "vsphere_datastore" "datastore" {
  name          = "datastore-name"
  datacenter_id = "${data.vsphere_datacenter.gtu.id}"
}

data "vsphere_network" "lab" {
  name          = "Lab"
  datacenter_id = "${data.vsphere_datacenter.gtu.id}"
}

resource "vsphere_folder" "lab" {
     path          = "Floating.IO Lab"
     type          = "vm"
     datacenter_id = "${data.vsphere_datacenter.gtu.id}"
}

Next, you need to add a data provider to reference the golden image we just created:

# Get data about the image you're going to clone from.
data "vsphere_virtual_machine" "my-image" {
    name = "my-image"
    datacenter_id = "${data.vsphere_datacenter.gtu.id}"
}

Once you have all this stuff defined, you can build your virtual machine. Note that we’re going to pass it a bit of extra information for cloud-init; this will be passed via the guestinfo facility.

# Configure the cloud-init data you're going to use
data "template_cloudinit_config" "cloud-config" {
    gzip          = true
    base64_encode = true

    # This is your actual cloud-config document.  You can actually have more than
    # one, but I haven't much bothered with it.
    part {
      content_type = "text/cloud-config"
      content      = <<-EOT
                     #cloud-config
                     packages:
                       - my-interesting-application
                       - rpmdevtools
                     EOT
    }
}

# And the virtual machine specification
resource "vsphere_virtual_machine" "vm" {
     name             = "my-hostname"
     folder           = "${vsphere_folder.lab.path}"
     resource_pool_id = "${data.vsphere_compute_cluster.cluster.resource_pool_id}"
     datastore_id     = "${data.vsphere_datastore.datastore.id}"

     num_cpus               = 2
     memory                 = 1024
     cpu_hot_add_enabled    = true
     memory_hot_add_enabled = true

     disk {
       size             = 64
       label            = "disk0"
       thin_provisioned = true
     }

     network_interface {
       network_id   = "${data.vsphere_network.lab.id}"
       adapter_type = "${data.vsphere_virtual_machine.image.network_interface_types[0]}"
     }

     guest_id = "${data.vsphere_virtual_machine.image.guest_id}"
     clone {
       template_uuid = "${data.vsphere_virtual_machine.image.id}"
     }

     extra_config {
       "guestinfo.userdata"          = "${data.template_cloudinit_config.cloud-config.rendered}"
       "guestinfo.userdata.encoding" = "gzip+base64"
       "guestinfo.metadata"          = <<-EOT
          { "local-hostname": "${var.hostname}.${var.domain}" }
       EOT
     }
}

Note that we use the cloud-init metadata to configure the local hostname, this being the primary reason for its existence in the image. This is due to a race condition: when the system boots it won’t yet have access to it’s reverse DNS entries because terraform won’t have added them yet. They can only be set once the VM is created and an IP address assigned. Setting it in the cloud-init metadata ensures that it’s always correct, even early on in the boot process.

DNS And Terraform

So now you have your VM in terraform; all we have left is ensuring that the host is in DNS. The forward side of this is easy in standard terraform methodology:

resource "dns_a_record_set" "dns-forward" {
  zone      = "lab.gtu.floating.io."
  name      = "my-hostname"
  addresses = [ "${vsphere_virtual_machine.vm.default_ip_address}" ]
  ttl       = 3600
}

It’s that simple; the only thing you need to pay close attention to is the extra dot ( .) on the end of the zone name. This is required; it’s a vagary of BIND and DNS in general.

While forward was easy, however, reverse DNS requires a little more creative trickery. IN-ADDR.ARPA zones use the IP address in reverse-octet order, so we’ll need to make the individual octets available for use:

locals {
  octets = "${split(".", vsphere_virtual_machine.vm.default_ip_address)}"
}

From there, we can construct a reverse DNS entry for this host; it’s simple, but a bit verbose:

resource "dns_ptr_record" "dns-reverse" {
  zone = "${local.octets[2]}.${local.octets[1]}.${local.octets[0]}.in-addr.arpa."
  name = "${local.octets[3]}"
  ptr  = "my-hostname.lab.gtu.floating.io."
  ttl  = 3600
}

And that’s it; DNS is done, man.

Application

With that, you have a fairly complete implementation. It clones your VM from an image, sets data to be passed to cloud-init, and when the VM finishes initializing, it will even add the DNS records for you. That’s all there is to it.

So, assuming you’re following along and adjusting hostnames and whatnot accordingly, you can now just run terraform apply to create your VM.

Go ahead, I’ll wait.

Conclusion

Implementing this in my lab is going to save me huge amounts of time in the future. With a few other components added – such as a provisioning script to join the IPA domain – I can go from zero to a new, configured host in roughly two minutes. The only reason it even takes that long is because I don’t have VAAI support via NFS on the Synology that contains my datastores. Cloning can take time.

I’m planning on picking up a better NAS and redoing things a bit to solve that issue, but that’s a topic for another day.

There are a number of directions this can be taken in. I’ll probably add puppet to the environment next, as I find puppet and terraform to be a very powerful combination. Terraform does great for building the initial host in a reproducible state, but puppet is much more attractive for managing the overall configuration of the host on an ongoing basis; I work with some things that don’t fit well with the “nuke&pave” update model that is currently so prevalent in the cloud.

Now I just need terraform 0.12 to be production-ready (or at least for the beta 0.12-compatible vSphere provider to work correctly!). It introduces features that will make it vastly easier to modularize and simplify a large terraform codebase.

I’m chomping at the bit…