Infrastructure testing for Ansible playbooks and roles: an introductory guide

This article demonstrates how you can do infrastructure testing for Ansible roles and playbooks. I explain how the tools Vagrant and Molecule+Docker let you easily provision temporary VMs or Docker containers in which you can experimentally run your Ansible roles/playbooks, or even run unit testing in Continuous Integration.

Introduction to infrastructure testing

Ansible is a CLI tool that provisions other machines. If you are new to Ansible, check out my introduction guide.

When creating Ansible playbooks (or writing your own roles), it is easy to make mistakes. Thus, you need a way to test your playbooks. Welcome to the world of infrastructure testing., which is about applying your knowledge about software testing to infrastructure-tools, making sure that these tools really do what you think they do. When relating software testing to infrastructure testing, there are two levels of testing we often want:

  1. Smoke testing: the minimal level of testing you can get by with (“better than nothing”)
    • In software testing, this would mean to just compile and start the application, then orderly shut it down again, and check that there are no errors
    • In infrastructure testing, this means to simply run the ansible playbook, maybe even run it twice (to check its idempotency), and check that Ansible does not raise any errors
  2. Unit/System testing: a more fail-safe way of testing
    • In software testing, we run individual components (unit tests) or the whole system (system tests), put it into a defined initial state, then run some functionality, and compare the output of the system with some hand-crafted expected output, the “test oracle”. The test fails if the output does not match the expected one, or if the component or system crashed.
    • In infrastructure testing, we can do the same as in software testing: unit tests would mean to test an Ansible role, and system tests refer to testing an entire playbook.

So the question is: what tooling can I use to write such kinds of tests, and how do I get testing-infrastructure set up easily (locally or in CI), to avoid that I need to use (real) hardware of my production system?

There are two tools at your disposal:

  1. Vagrant: Vagrant lets you do smoke testing of your Ansible playbook or role, by creating and destroying locally-running virtual machines (VMs), using various hypervisors (e.g. VirtualBox)
  2. Molecule & Docker: Molecule is a testing framework (like JUnit, pytest, etc.) for Ansible roles and playbooks. Molecule uses Docker containers to provision temporary environments, which are faster to spin up than VMs, but require a few tricks to get systemd-based services to work. Molecule can be used locally and in CI/CD environments.

Using Vagrant for infrastructure testing

Vagrant is a CLI tool that uses recipes (a Vagrantfile) to download, start and provision VMs. See Vagrant introduction and use cases to learn more about Vagrant and how to use it. The basic idea is that with the right Vagrantfile, all you need to do is to run “vagrant up“, and a few minutes later you have one (or more) VMs with a naked Linux distro installed, which you can then provision with Ansible.

There are two approaches to integrate Vagrant with Ansible:

  1. Install Ansible and Vagrant on the host, and use Vagrant’s Ansible provisioner, as explained the Ansible docs here.
    • What happens here is that once “vagrant up” has completed, Vagrant will automatically call the ansible-playbook CLI (on the host) for you, pointing it to an inventory file that Vagrant generated for that VM.
    • This approach is the one I would generally recommend. It is easy to get it to work on any UNIX-based hosts, e.g. macOS or Linux. On Windows, getting Vagrant and Ansible to work is more complicated, because you must use WSL (because Ansible requires Linux). However, then you also have to get Vagrant to work inside WSL, which is considerably more complicated, and official support is experimental. There are guides such as this one, but your mileage may vary.
  2. Install only Vagrant on the host, and install Ansible into one of the VMs
    • There are two sub-variants.
      • Create a dedicated control VM that contains only Ansible, see here for pointers. Make sure to set config.ssh.insert_key to false for all the VMs that are not the control node
      • Have Vagrant install Ansible into the VM that shall be provisioned, using the ansible_local provisioner.
    • This approach is the most “platform-independent” approach, and is probably the most stable approach if some members of your team use Windows on the host, where approach #1 is difficult to set up.
    • However, these approaches are more resource intensive (extra control VM), or “unclean” when installing Ansible into the same VM that Ansible is supposed to provision (installing Ansible “taints” the VM).

Using Molecule for infrastructure testing

While the Molecule docs focus on describing how to test Ansible roles, you can also test complete playbooks, described next.

To get started, you need to install the Python packages for Molecule and Docker, e.g. via pip3 install molecule[docker]

In your project directory (where your playbook.yml is stored), run molecule init scenario to create a new “molecule” folder.

Delete the files create.yml, destroy.yml and INSTALL.rst in the molecule/default directory, because they are generally not necessary.

Open the molecule/default/converge.yml file and make it look like this:

# converge.yml
---
- name: Converge
  hosts: all
# Tell Molecule not to test a role, but run our playbook
- import_playbook: ../../playbook.ymlCode language: YAML (yaml)

Next, we have to change the molecule/default/molecule.yml file. The platforms section of that file defines which Docker container(s) to spin up. You could have multiple entries there, which tells Molecule to spin up a container for each defined image, and run your playbook inside it.

Here is an example:

# molecule.yml
---
dependency:
  name: galaxy
driver:
  name: docker
platforms:  # Configures the list of environments to which Molecule applies our playbook/role
  - name: instance
    image: "geerlingguy/docker-rockylinux8-ansible:latest"
    pre_build_image: true
    # The following 4 lines are needed only for making systemd work
    command: ""  # disables that Molecule overrides the Docker container's start command and instead run's the init-system binary
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
provisioner:
  name: ansible
verifier:
  name: ansibleCode language: YAML (yaml)

There is one big caveat: your playbook will most likely use Ansible modules/roles that ensure that some Linux service is started (or restart one or more services). This requires a working init system (e.g. upstart, systemd, …). Normally, Docker images do not contain an init system, because init sytems require a very high degree of Linux privileges, which containers (by default) should not be given. But here we do need an init system, which is why privileged: true is set in the above file. We also need a Docker image that contains an init system. See here for a list of available images maintained by a very active Ansible community member, Jeff Geerling. On that page, scroll down to the “Container Images for Ansible Testing” section, and check the Maintained column, to find which images are (still) actively maintained.

Systemd containers and WSL2

If you are on Windows and use WSL2, you cannot use Docker-containers that internally use systemd, because the WSL2-environment itself does not come with systemd. You need a “host” OS with properly-working systemd, as it seems. You can achieve this with a Hyper-V or Virtualbox VM on Windows, into which you install Docker, Ansible and Molecule.

You can now use the following commands (make sure the working directory of your terminal is the project root):

  • molecule converge: creates a new Docker container and runs your playbook or role in it (applying the playbook/role is what Molecule calls “converging”)
  • molecule destroy: destroys a possibly-existing Docker container
  • molecule test: destroys and recreates the container, and runs your playbook in it
  • molecule login: gives you a shell into the running Docker container, which is useful for debugging failing playbooks/roles

There are quite a few more commands, see docs. You can deduce what these commands do in detail, by looking at the scenario-default list here.

  • The content of the scenario-default-list snippet (that starts with scenario:) has the format
    <name-of-command>_sequence. For instance, whatever steps are shown in create_sequence are what Molecule does when you run molecule create.
  • A few details about these steps:
    • dependency: if you were to test a role, which requires other roles, you could add a requirements.yml file in the default scenario directory of the role, and Molecule would install them in this step
    • create: creates the Docker container
    • prepare: if you create a prepare.yml playbook in the default scenario directory of a role, Molecule runs it
    • cleanup: if you create a cleanup.yml playbook, Molecule runs it. This is for cleaning up test infrastructure that may not be present on the instance that will be destroyed. The primary use-case is for “cleaning up” changes that were made outside of Molecule’s test environment. For example, remote database connections or user accounts. Intended to be used in conjunction with the prepare step, to modify external resources when required.
    • converge: Molecule runs your role or playbook (whatever you defined in the converge.yml file) in the running Docker container
    • destroy: destroys the running Docker container
    • idempotence: runs converge twice, and complains if Ansible reported that that something has changed
    • verify: runs your verify.yml playbook, which corresponds to the “test oracle” of software testing. Here you write Ansible tasks that verify whether all the services and files installed by the role/playbook are really working. For instance, for testing a playbook that installs a web server, you would have a file such as this.
  • You can change the scenario sequence as you see fit in your molecule.yml file!

If you want to know how to run molecule in CI (GitHub Actions), see here.

Conclusion

The presented tooling, Vagrant and Molecule, improve the life of an automation engineer significantly. During the experimentation phase, where you still figure out the structure of the role or playbook, you can now work faster and save money, because you don’t need to pay for real hardware, nor wait for the provisioning of virtual hardware. In addition, Molecule’s ability to run in CI and to write unit/system tests (as you know them from software testing) improves the infrastructure code quality, and catches problems early.

Leave a Comment