Ansible introduction: the definitive guide to provisioning with Ansible

This Ansible introduction explains what Ansible is and how it works. I explain the most important concepts and demonstrate them by example. I go into common pitfalls, such as installing Ansible, handling secrets, or getting reproducible projects.

Ansible introduction

Ansible is a CLI tool that provisions other machines. Here, provisioning refers to installing and configuring software on other machines. A concrete example from the Linux world would be to install a Linux package via apt/yum, or to make sure that a systemd service exists and is started, or that specific files are present on the machine. Another frequently used term for provisioning tool is Configuration Management tool.

In general, provisioning machines means that you typically have many machines that shall be provisioned (let’s call them target machines), and one machine that knows the provisioning instructions, like a “recipe” (let’s call it the source machine).

Provisioning tools, such as Ansible, Salt or Puppet, apply either an agent-based or agentless approach. Ansible applies the agentless approach, where the source machine needs a technical way to directly connect to the target machines (e.g. via SSH or WinRM). The target machines need no specific agent pre-installed (but it does need a listening service, e.g. a SSH daemon, and a Python interpreter, see managed node in docs). In contrast, agent-based software, such as Puppet / Chef / Salt, requires that the target machines already have some kind of (software) agent installed, which maintains a persistent connection to the source machine, downloading provisioning-instructions, which the agent then applies to the very same target machine on which it is installed. Which approach is better depends on the circumstances, but the industry is moving towards favoring agentless tools like Ansible. Even though it can be challenging to establish the network connectivity, the advantage of not requiring any preinstalled software on the target machines makes agentless tools the clear winner.

Why use a provisioning tool like Ansible?

Once you have to provision many machines, the only viable solution is some kind of automated scripting. It’s good practice to document such scripts and put them into version control, for better transparency in your team. However, you may wonder why using Ansible is better than, say, a hand-crafted Bash script that connects to all your servers via SSH and runs a set of commands?

Ansible has two main advantages over such hand-crafted solutions:

  1. With Ansible, you write files that define the final state of the target machines in a declarative way, not using imperative (modifying) statements. With Ansible, you define tasks such as “make sure package X is installed”, which is different to “install package X”. The former is idempotent, meaning that you can run Ansible tasks repeatedly, and always get the same result.
  2. Ansible has a huge community, which has built thousands of reusable recipes (referred to as Ansible “modules” or “roles”) that do all kinds of often-needed things. This saves you a lot of time. For instance, if you want to install Docker engine on a target machine with Ansible, you could hand-craft an Ansible recipe that runs all the terminal commands explained in Docker’s installation manual. But a much better, faster and easier way is to simply use the available Docker Ansible role, which figures out how to install Docker for you, and it supports all common Linux distros.

Required prerequisite: YAML

Ansible makes heavy use of YAML files. Therefore, you must acquire a solid working knowledge of the YAML file format. Otherwise, you will pull your hair over “weird errors”, or worse, there are no errors but Ansible does not behave the way you expect it to.

If any of the following things sound unfamiliar to you, you should definitely brush up your knowledge first, e.g. with an introductory guide such as this one (or the one by the Ansible team, see here):

  • White space / indentation matters and often produces no parser errors (but instead changes the meaning)
  • There are several alternative ways of expressing arrays
  • There are different approaches to express multi-line values
  • You know when to quote values, and why quotes are necessary for something like ENV_VAR: "true"

While you are still learning (and assuming that you understand JSON syntax well), it can be helpful to test your knowledge, by converting files between JSON and YAML, e.g. using this online converter.

You should also be aware that Ansible has a lot of history using the INI format, due to its heavy use of Python’s ConfigParser module (docs). This leads to some extra “magic-looking” syntax which lets you insert INI-style key=value pairs into YAML, which are interpreted by Ansible. The following example demonstrates the concept:

- hosts: webservers
  tasks:
    - name: create custom directory
      # From a pure YAML perspective, "file" refers to a string
      file: state=directory recurse=yes path=/custom
---
# This is how Ansible interprets it (and how you could also have written it)
- hosts: webservers
  tasks:
    - name: create custom directory
      file:
        state: directory
        recurse: yes
        path: /customCode language: YAML (yaml)

I personally prefer the latter variant that uses “pure” YAML, and avoid the key=value shortcuts.

Installation

Ansible is a Python package (see the ansible package on PyPi), and therefore requires a Python interpreter on your machine. The installation is operating-system dependent:

  • On macOS, the easiest method is to install Ansible via Brew (brew install ansible). Under the hood, Brew makes sure that a suitable Python interpreter is installed, along with Ansible itself.
  • On Linux, you first need to install Python (and pip). How you do that depends on your Linux distribution. Then you use pip to install the ansible package.
  • On Windows, you must use a WSL Linux distro, or set up a Linux VM (e.g. with Vagrant, see also this article). Ansible does not support Windows natively on the source machine (where you run Ansible). However, the target machines may run Windows, to which Ansible connects via WinRM.

Outdated versions

A major caveat to be aware of is that you may accidentally install a very outdated Ansible version, without even realizing it!

Caveat #1: The command ansible --version (which you think you could use to verify whether you have the up-to-date Ansible version) does not show the version number you expect. The command ansible --version shows you the version of the ansible-core Python package, which has a completely different versioning scheme. The version you want to know (and which is also shown at the top left in the Ansible documentation menu, e.g. version 6) is the version of the ansible Python package, which is some kind of meta package that references other packages (such as ansible-core). You can determine its version via python3 -m pip show ansible

Caveat #2: The version that pip installs depends on the version of the Python interpreter. If your Linux distro comes with an older Python interpreter, such as Python 3.7, then pip will happily install an old, no-longer-maintained Ansible version, without warning you. Check the left sidebar of the ansible PyPi package (under “Programming language”) to see the list of currently supported Python interpreters. At the time of writing, this is Python 3.8 or newer. The exact Python version depends on the Ansible core version, see the Ansible docs here.

This also means that if you use WSL, you need to carefully choose your WSL distro. For instance, avoid Debian, which is known for shipping very old packages that are close to their “security-updates” expiry date (like Python 3.7).

Ansible concepts by example

Let’s look at a simple practical example which provisions two CentOS/RHEL Linux machines to become web servers, by installing and configuring an Apache2 server. You simply place these two files in a directory and trigger the provisioning via ansible-playbook --inventory inventory.yml playbook.yml

# inventory.yml
---
all:
  children:
    webservers:
      hosts:
        foo.example.com:
        bar.example.com:Code language: YAML (yaml)
# playbook.yml
---
- hosts: all
  become: yes
  tasks:
    - name: Install Apache (httpd)
      ansible.builtin.yum:
        name:
          - httpd
          - httpd-devel
        state: present
        update_cache: yes
 
    - name: Copy config files
      ansible.builtin.copy:
        src: "{{ item.src }}"
        dest: "{{ item.dest }}"
        owner: root
        group: root
        mode: 0644
      with_items:
        - src: httpd.conf
          dest: /etc/httpd/conf/httpd.conf
        - src: httpd-vhosts.conf
          dest: /etc/httpd/conf/httpd-vhosts.conf
 
    - name: Ensure that Apache is started at boot time and right now
      ansible.builtin.service:
        name: httpd
        state: started  # start it now
        enabled: yes  # start it at boot timeCode language: YAML (yaml)

Let’s look at the concepts in these files in detail.

Let’s start with playbook / play / task / module:

  • The playbook.yml file contains an Ansible playbook, which is simply an array of plays. The “playbook” term is an analogy to sports, e.g. of an American football team that follows pre-built playbooks during a game. The example playbook.yml file shown above defines a playbook array with only one play, to keep the example small.
  • Each play is a recipe that describes one or more steps that achieve a common goal (here: installing&configuring a web server). A play defines two things:
    • hosts, which is a filter term that dictates on which group of known target machines  (a.k.a. “inventory”, explained below) this play should be executed on. If you set the value to all (the top-level group in the inventory.yml file), this means that there is no filter → the play will be executed on all known target machines
    • The recipe, that is, the individual steps to be executed: this can be defined in many different ways. The example file above defines an array of tasks right in the playbook. But this becomes unmaintainable at a certain point, causing too big files, which you should break down into several smaller files. For this, you can reference an existing role (a horribly misleading term that does not mean what you think it means, see below for details), or include/import tasks or even playbooks (see docs).
  • Tasks are where the actual work gets done. A task always has a name (printed to the console while the task is running) and a module (as in “Python module”), e.g. ansible.builtin.copy. Modules are “low-level” pieces of functionality, that can do all kinds of things, e.g. copying files from the source machine to the target machine, or configuring a systemd service, etc. Because there are thousands of modules, they are namespaced (e.g. ansible.builtin...). Let’s look at the tasks of the above example and see how they make our lives easier:
    • Task 1 tells the YUM Linux package manager to ensure that the http and http-devel packages are installed, making sure that the package manager cache is up-to-date (“yum check-update“).
      • Note: we could even have used the Linux-distro-independent ansible.builtin.package module, which automatically figures out which package manager (yum, apt, …) your target machine uses – this makes the task even more portable across Linux distros!
    • Task 2 copies two files from the source machine (where Ansible is running) to the target machines, applying the defined chown and chmod, without you having to write any “chmod ...” commands.
    • Task 3 ensures that the httpd service is running from now on, and is started on boot. The ansible.builtin.service module automatically figures out which init system (upstart, systemd, …) your target machine uses.
    • Ansible will run the tasks in the order they are defined (top to bottom). Ansible also lets you dynamically skip executing tasks (using “when:“), depending on conditions such as the value of a variable. And since tasks can also write their result into a variable, you can build dependencies between tasks (not shown in the above example).
    • Note: because all 3 tasks require root user permissions, the statement “become: yes” is shown on line 2. It tells Ansible to auto-detect how to become root user (e.g. via sudo). When you add it to the top level in the YAML file, it applies to all tasks, but you could also have specified it for each task individually.
  • Many modules are idempotent, meaning that they don’t repeat the work if that is not necessary. In other words: you can run them repeatedly and the effect is the same. In fact, while a play is running, the module will not only report whether it was successfully, but also whether it actually had to change anything on the target machine. Idempotence lets you run playbooks on already-provisioned machines, to make sure that they (still) conform to the state defined in the playbook.

Now we need to understand the Ansible inventory, the first file shown above:

  • The inventory is a file that defines a hierarchically-organized pool of hosts (what we call “target machines” in this article) that your Ansible playbook can run against. In other words: it specifies where a playbook can run, but not how to authenticate with the targets (more details below).
  • The above example defines the meta-group “all”, and inside it, it defines a single group called “webservers” with two hosts (which are not defined as strings but YAML objects, because they could have sub-keys). It is possible to define many groups, or hierarchically nested groups, or define a list of hosts right below the all group.
  • By default, Ansible loads the inventory from /etc/ansible/hosts, unless you override the location, e.g. by specifying the --inventory argument, which we did above.
  • In the playbook.yml file, the “hosts: ...” part specifies a group name (or individual host) from our inventory. In our example, we specified the top-level “all” group, but we could also have specified “webservers” or “foo.example.com“.
  • Historically, Ansible used the INI format (not YAML) to define the inventory, as shown in the inventory docs. Ansible now officially supports INI and YAML, but I personally like the YAML format better, because it is more descriptive.
  • An important concept (not shown in the example) are host variables and group variables. These are variables (=key-value pairs that are consumed somewhere in the playbooks) that either are set for individual hosts, or host groups. For instance, in our example, we might want each web server host to serve a slightly different index.html page (e.g. containing the host name). Host/group variables can either be defined right in the inventory file (docs), or in a pre-defined directory structure (e.g. group_vars/<group-name>.yml), see docs.
  • If you use ephemeral (short-lived) cloud infrastructure (e.g. cloud-based VMs) for your target hosts, where maintaining such a static inventory would be impossible, Ansible also supports dynamic inventories, see docs.

Authentication

Ansible not only needs to know which target machines to connect to (defined in the inventory), but also how to authenticate against these target machines. Let’s assume you use SSH, which is the most common connection approach anyway. Ansible has multiple options to specify the authentication:

  • You can tell Ansible to ask you for the password upon connecting to the target machines (when using password-based SSH authentication), by providing the --ask-pass argument to ansible-playbook
  • For password-based authentication, you could also specify the ansible_user and ansible_password host variable (or group variable), as shown in the example below.
  • If you use SSH key-based authentication, which does not require typing any passwords, there are multiple ways to configure the private key location in Ansible:
    1. Configure your local SSH configuration to already know which private keys to use, depending on the target machine, by adding the following two lines to your ~/.ssh/config file for each target machine:
      Host somehost.org (or some IP)
      IdentityFile ~/path/to/private_key
    2. Provide the command line argument --private-key=<path> to ansible-playbook
    3. Set ansible_ssh_private_key_file in the inventory file, either as host variable (if the private key only works for a specific target machine) or as group variable (if they private key works for all target machines that belong to that group).
  • With SSH’s key-based authentication you still need to tell Ansible which username to use on the target machine. If you don’t specify anything, it will use the name of your local UNIX user. Just specify the ansible_user host or group variable in the inventory file.

Example that illustrates host and group variables that configure authentication:

# inventory.yml
---
all:
  # define a group variable that specifies that all target machines use the same SSH username
  ansible_user: some-user
  children:
    webservers:
      hosts:
        foo.example.com:
          # Specify a host variable: this host supports private key authentication,
          # we provide a relative path to the private key file
          ansible_ssh_private_key_file: ./private-key-file
        bar.example.com:
          ansible_password: foobarCode language: YAML (yaml)

Handling of secrets

Often, there are “secrets” involved when provisioning a machine. For instance, suppose you want to provision a CI/CD build machine which includes installing a build-agent (e.g. a GitLab CI/CD runner, or Azure DevOps agent). This agent must be given an access token that it uses to connect to the CI/CD platform. You certainly don’t want to commit this secret token to your (Git) repository that stores your Ansible playbook.

To avoid leaking secrets, there are two basic approaches:

  • Make your Ansible tasks require the value of the secret as variable (e.g. {{ gitlab_runner_access_token }} ), but don’t commit the value of the variable. Instead, whoever calls the ansible-playbook CLI must specify the variable’s value as an argument (--extra-vars "variable-name=value“). Alternatively, you can use vars_files in your playbook, to reference a file (that defines the secret’s variable and value) that is not committed to Git (and is even on your .gitignore list). The user must manually create and fill that file with the secret. There is also this plugin that can help you store secrets in your OS-dependent credentials store (e.g. keychain on macOS).
  • Commit encrypted secrets to Git: Ansible comes with a Ansible Vault feature (docs), which lets you define a self-chosen vault password that is used to encrypt either entire yaml files (which you include/import in your playbook, e.g. with vars_files), or define encrypted inline variables. You commit the encrypted content to Git and communicate the vault password in other (secure) ways to your team. When calling ansible-playbook --ask-vault-pass, you will be prompted for the vault password. You could also store the vault’s password in plain text in a file (outside of version control) and reference the path to it via --vault-password-file. Finally, you can store the vault password in your OS-dependent credentials store, and use a helper-script (such as vault-keyring-client.py) to load it at runtime.

Roles

An Ansible role is simply a bundle of folders and files, arranged in a predefined directory structure. It consists of Ansible tasks and a few related files. The goal is to achieve better modularity for your playbooks. While you could get some level of modularity by using Ansible’s import or include mechanisms (e.g. on the task or playbook level), the main advantage of using a role is its predefined directory structure. The idea is that a role is a reusable artifact. In this context, “reusability” means that the role is very customizable (via input variables), and thus can be used by many kinds of project (that might need this customization). Typically, a role defines sane default values for most of these variables.

A typical example is the Docker Ansible role, which installs Docker Engine on your Linux machine. It comes with many input variables, but most of them have sane defaults. But you could, if necessary, customize the Docker engine’s version the role installs, the Docker daemon options, the package repository URL, and much more.

You can download hundreds of roles from Ansible Galaxy (details below), or create your own role, stored in the directory <project-root>/roles/<name-of-role>. The role’s predefined directory structure is documented here. Only the tasks/main.yml and the meta/main.yml files are necessary, the other folders and files are optional.

As documented here, calling a role can be achieved in various ways. You only specify the name of the role – there is no need to explicitly point Ansible to individual YAML files, such as tasks/main.yml. Thanks to the predefined directory structure that roles conform to, Ansible already (implicitly) knows where to look for these files.

Roles are a great concept. Unfortunately, for newcomers, the name “role” is horribly misleading, because it has nothing to do with “roles”, as in “role-based access control” etc.

Ansible Galaxy

Ansible Galaxy is a repository of reusable content, maintained and contributed to by the Ansible community. It is somewhat similar to repositories such as PyPi for Python, or NPM for Node.js. Ansible Galaxy manages two kinds of artifacts:

  • Role: as explained above
  • Collection: a newer package format that bundles multiple roles, playbooks, Ansible modules and plugins into one artifact

You can download and install roles or collections via a CLI tool:
ansible-galaxy [role|collection] install <artifact-namespace>.<artifact-name>[:<version>]

Alternatively, you can define an requirements.yml file that explicitly lists the roles and collections, as documented here, which is then used by ansible-galaxy install -r requirements.yml

Reproducible Ansible projects for your team

When working on professional projects, reproducibility is important: you want your playbooks to work, no matter who executes them, or where they are executed. In other words: you want to avoid phrases like “provisioning worked on my machine” at all costs! All the information Ansible uses should come from version control (except for secrets, of course).

If you don’t know the trick presented next, it is easy to run into all kinds of problems with Ansible: Ansible looks for roles, collections, or inventory in machine-specific folders, such as /etc/ansible. It is easy to see how this could wreak havoc. The content of /etc/ansible is not under version control, and if the user forgets to specify the path to the inventory file when calling ansible-playbook, they would provision the wrong machines (those defined in /etc/ansible/hosts). Or they might use an incorrect version of a role or plugin that they installed on their machine (for a different project).

To avoid such problems, add the following files to version control, to the same folder that also stores your playbook.yml file:

See docs. This INI file configures Ansibles behavior. This is a good starting point:

[defaults]
# The following two lines force Ansible to look for collections and roles only in the project directory
# This also means that the "ansible-galaxy install ..." command installs them into your project directory,
# which avoids plugin version clashes with other Ansible projects
collections_paths = ./collections
roles_path = ./roles
inventory = ./inventory.yml
 
# Optional, if using the same private key for all hosts
# private_key_file = ./relative/path/to/id_rsaCode language: PHP (php)

This file contains your required, version-pinned Ansible Galaxy roles and collections; make sure to mention the ansible-galaxy install -r requirements.yml command in your project’s README file. Example:

---
roles:
  - name: geerlingguy.java
    version: 1.9.6

collections:
  - name: geerlingguy.php_roles
    version: 0.9.3Code language: YAML (yaml)

Contains your inventory, see above for details and examples.

provision.sh (or named similarly) is a simple Bash/etc. script that makes sure that your team won’t forget specifying the correct playbook file, or installing the required dependencies. Your project’s README should instruct your team to only use this script, rather than calling Ansible CLI-commands directly. Example:

#!/bin/bash

ansible-galaxy install -r requirements.yml
ansible-playbook playbook.ymlCode language: Bash (bash)

When the user changes the working directory to the folder that contains playbook.yml and runs provision.sh, then Ansible will detect the ansible.cfg file, and use only this file (without merging it with any user-specific Ansible config from ~/.ansible.cfg or /etc/ansible/ansible.cfg.

Finally, you might want to ensure that the version of Ansible itself is pinned (or at least limited to some minimum version), to avoid negative side effects when users try to use too old (or new) Ansible versions. There are two approaches:

  1. Use Docker to run a pinned Ansible version, e.g. using the images by jauderho or haxorof. You need to bind-mount your project directory into the Docker container and make sure to run the command that installs any Galaxy dependencies, before running ansible-playbook
  2. Use the Ansible CLI on the user’s machine, but add a task to your playbook that ensures a certain Ansible version: see here for pointers

Conclusion

Ansible is a fantastic tool for provisioning machines. This Ansible introduction only explained the basics, but there are many more things you can do with Ansible, such as:

  • Provision not only “normal” Linux machines, but provision network devices (such as routers), which have stripped-down Linux distros (or sometimes do not run Linux at all). See the docs for details.
  • Use Ansible not just for provisioning, but also for deploying software. There are, of course, a thousand ways to deploy a packaged piece of software, but Ansible covers many approaches. You could write plays that directly copy files (e.g. a “war” archive to a JEE server), or use Ansible’s Docker or Kubernetes collections to deploy Docker containers or Kubernetes YAML files.
  • Ansible can be used in an ad-hoc mode (docs). While running playbooks is about repeatable provisioning, ad-hoc mode is about running one or two (non-repeatable) commands, because you simply need them right now, e.g. rebooting all target machines in your inventory, or copying a specific folder to the target machines. This article provides a few examples.

For further resources, consider the well-written official Ansible user guide, an awesome-ansible list, or the fantastic Ansible for DevOps book!

Leave a Comment