Ansible

Introduction

Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration. Ansible was written by Michael DeHaan and acquired by Red Hat in 2015. Ansible is agentless, temporarily connecting remotely via SSH or Windows Remote Management (allowing remote PowerShell execution) to do its tasks.

In other terms, through simple declarative files describing actions, ansible allows to automate the process of deploying, provisioning and configuring servers.

Below is a video from Ansible - What is Ansible?:

Documentation

Instead of rewriting a documentation which already exists and which is pretty good, we strongly recommend you to take a look at the official documentation for Ansible.

We especially recommend that you read the following sections in order to be able to understand and practice the tutorials provided below:

User Guide, especially:
- Ansible Concepts
- How to build your inventory
- Connection methods and details
- Patterns: targeting hosts and groups
- Working With Playbooks, especially:
  - Intro to Playbooks
  - Create Reusable Plabooks, especially:
    - Roles
  - Using Variables
  - Best Practices
- Working with modules, especially:
  - Introduction to modules
  - Return Values
YAML Syntax

If you want to go deeper and understand the ansible roles used to deploy a PAGoDA instance and, even better, contribute to PAGoDA development then we also recommend that you read the following sections:

User Guide, especially:
- Working With Playbooks, especially:
  - Templating (Jinja2)
  - Conditionals
  - Loops
  - Advanced Playbooks Feature, especially:
Module Index

REMARK

The above last link is not a reading requirement but is only provided for you to have an entry point to each ansible module. Indeed, when developing an ansible role, you will often need to refer to the documentation of the modules you wish to use.

Tutorials

Assumptions

The following ansible tutorials all assume that:

You have to fulfil the install requirements described in Requirements - Local Software Requirements.
You have read (or at least browsed) the above provided documentation links: a complete understanding is not strictly required and you might catch back while practicing with the tutorials .
You have read and practiced the tutorial related to your cloud provider as described in Supported Cloud Providers. You should thus be familiar with the usage of your cloud provider and at least know how to create (OpenStack) servers/instances.
You are familiar with connecting with SSH to a remote computer, using SSH keys (i.e. creating a SSH key pair, distinguishing the private from the public keys, registering a public key on a remote server, starting a local SSH agent as well as loading an SSH key to be able to connect to some remote server).
You already have at least one server up and running.
This server is accessible via SSH with properly set authentication SSH keys¹.

Goals

The following tutorials aim at familiarizing you with the basic usages of ansible and especially with the usage of existing ansible roles, since this knowledge is a strong requirement for deploying a PAGoDA instance. The following tutorials should teach you how to :

Set up ansible requirements,
Create an inventory,
Create a basic playbook defining some tasks,
Create a role holding tasks and using basic variables
Set up variables for some specific (single) server (or host) or for a group of servers.

These tutorials will not teach you how to develop a complete ansible role. They are here to help you understand how ansible works and for you to be able to read existing ansible configurations (roles, playbooks, inventories, hosts and groups variables, etc.)

Convention

The following tutorials will use the following convention:

The servers (or managed nodes) that you will use for the tutorials should be configured such that (otherwise you will need to customize the tutorials to adapt/update them to your own setup, like changing the IP of servers or modifying the username used to login):
- they are accessible through SSH using the standard port 22
- SSH keys should be properly configured (an ssh connection should not interactively prompt for a password),
The manager (or the control node), i.e. the computer on which you will run the ansible commands, SHOULD NOT ONE BE ONE OF THE SERVERS: in other terms the control node should not be one of the managed nodes.

Important

The following tutorials are here to help you understand the basis of ansible in order to be able to use the roles and playbooks that were developed for PAGoDA. We strongly recommend that when practicing the following tutorials you do it "from within" a temporary folder in order to avoid "polluting" your cloned PAGoDA repository.

Setting up ansible

REMINDER Before starting this tutorial, you must be comfortable with the basic usage of your cloud provider mainly you must be able to create simple servers/instances.

Setting up ansible

The first thing to do is to install the required binary/command enabling you to use ansible. There are two possibilities:

Your OS has a ansible package (not recommended because system wide).

In this case, you can directly realize a system-wide package installation. For instance, for Ubuntu :
```
sudo apt-get install ansible
```
You should be done: ansible should be installed on your computer.

REMARK: if you prefer a smaller installation footprint (at the user and directory level as opposed to system-wide) then use the python virtual environment approach that is described below.
Install ansible in a python virtualenv environment (recommended: small footprint).

We here assume that you installed the Local software requirements.

First create and activate a python3 virtual environment:
```
# Assuming we will just test ansible, we need a temporary folder
$ mkdir ~/temp
$ cd ~/temp
# Create a python virtual environment in folder `.env`
$ python3 -m venv .env
# Activate the python virtual environment
$ source .env/bin/activate
(venv)
```
REMARK Depending on your cloud provider (and its corresponding documentation, refer to Supported Cloud Provider), you might already have a temporary virtual environment in ~/temp that you can use the same virtual environnement.

Then within this activated python virtual environment, install required dependencies for ansible.
```
(venv) pip3 install ansible
```
You should be done with the install and now have the ansible command available.

In order to deactivate the python virtual environment, simply type the following command:
```
(venv) deactivate
$
```

Inventory File

Building the inventory file

When using ansible the first at hand consists in creating an inventory that lists the set of server that will be managed by ansible (managed nodes). This task is documented the How to build your inventory section of the official documentation.

For this tutorial we will create a simple file called hosts with the following content:

[tuto]
tuto_ansible    ansible_host=192.168.1.42

As you have read in the previously mentioned How to build your inventory section, you should be able to understand the structure of this file, that boils down to

this inventory declares a group (of hosts) labeled tuto
this tuto group is composed of a single host aliased to become tuto_ansible and whose IP address is 192.168.1.42.

The tuto_ansible alias is just here for the user to have a more explicit information when running ansible tasks or playbooks. Indeed without aliases, ansible's outputs would refer to managed hosts through their IP address which obfuscates the logs obscure (if you happen not to perfectly know the IP addresses of your servers).

Assumption

This tutorial assumes that all the hosts declared in the inventory are SSH accessible via user test_tuto with no password prompting. If this were not the case just adapt the -u user flag argument in the following ansible commands in order to match your servers configurations.

Testing the inventory file

Now, let us test our inventory file.

First we will ping our server, this can be done with the following command:

ansible -i <path/to/inventory> <group_in_inventory> -m ping -u <ssh_username>

Applied to our example, this command will be:

ansible -i ./hosts tuto -m ping -u test_tuto

Normally, you should see the following output:

tuto_ansible | SUCCESS => {
  "ansible_facts": {
    "discovered_interpreter_python": "/usr/bin/python"
  },
  "changed":  false,
  "ping": "pong",
}

Such a result indicates that your inventory is ready to be used with ansible playbooks and that you can proceed with the next tutorial.

If you do not get such an output, this often means that ansible is unable to connect to your server and some troubleshooting is require (click below on the Troubleshooting section).

Troubleshooting

tuto_ansible | UNREACHABLE! => {
  "changed": false,
  "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.42 port 22: No route to host",
  "unreachable": true
}

This means ansible is unable to connect to your server (trouble with IP connectivity). The most frequent reasons for such an error are:

The IP address you provided in your inventory is erroneous,
The SSH service on the server is not started,
There is some firewall in between your master node and your managed nodes

tuto_ansible | UNREACHABLE! => {
  "changed": false,
  "msg": "Failed to connect to the host via ssh: test_tuto@192.168.42.42: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).",
  "unreachable": true
}

This means ansible is unable to authenticate on your server with ssh (trouble at the ssh protocol level). The most frequent reasons for such a failure are:

The specified user (test_tuto in this tutorial) does not exist on your server or you provided a wrong username in the command,
User test_tuto does not allow SSH access using the SSH key mechanism: you can review the content of the ${HOME}/.ssh/authorized_keys file on your managed server in order to ensure that the content of this file holds the public key associated to the private key you want to use to connect to your server.
The private SSH key used to connect to your server is not the one associated in the file ${HOME}/.ssh/authorized_keys on your managed server. You can try to
- check the keys that you registered with your ssh agent (command ssh-add -l) or
- check your ~/.ssh/config or
- add the --private-key PRIVATE_KEY_FILE argument to the ansible command in order to designate the proper private-key file.

For further troubleshooting options, please refer to Network Debug and Troubleshooting Guide section of ansible official documentation.

Basic playbook with tasks

In the previous tutorial, you did set up a basic inventory configuration and you ensured that your (node) manager was able to communicate with your (node) servers.

In this tutorial you will learn how to describe a playbook with some basic tasks in order to configure your group of servers. The aim of this tutorial is twofold

to make you familiar with the ansible playbook documentation and playbook usage,
to get you acquainted with the ansible module documentation and basic usages of such modules.

What is a playbook ?

From the ansible online documentation - Working with playbook:

Playbooks record and execute Ansible’s configuration, deployment, and orchestration functions. They can describe a policy you want your remote systems to enforce, or a set of steps in a general IT process.

If Ansible modules are the tools in your workshop, playbooks are your instruction manuals, and your inventory of hosts are your raw material.

In other terms, playbooks will allow you to run multiple tasks defined by ansible modules on a set of specific servers defined by your inventory.

What is a module ?

In order to execute advanced configuration/deployment tasks on some server, ansible makes use of well known modules which are able to execute specific tasks. For instance:

The package module will allow you to manage packages on your server, e.g. to install them (and of course specify which specific version to use) or conversely to ensure that some packages are not installed.
The file module will allow you to manage the state of files (e.g. files installed by some package) like ensuring that some file does exist (or does not) or some other file is indeed a symbolic link etc.

Basic playbook and tasks

First, let us start with the following very simple playbook (that we shall place in file named playbook_1.yaml):

  # Select the group of server on which this playbook will be run
- hosts: tuto
  # Tell ansible to inhibit its default behavior that implicitly consists 
  # in first collecting information concerning the servers of the group:
  gather_facts: false
  # Define the user name to be used for (ssh) connection to the servers
  remote_user: test_tuto
  # Describe a list of tasks to be applied on this group of servers
  tasks:
      # Name the task
    - name: Try to ping servers of the group
      ping:

This playbook_1.yaml playbook will execute the ansible ping module on the servers of the tuto group (that is described in the inventory). In other term, this playbook will execute the same task that the previous tutorial realized through other means (as arguments on the command line).

To execute this playbook, type the following command:

ansible-playbook -i ./hosts playbook_1.yaml

The output should look like :

PLAY [tuto] ************************************************************************************************************

TASK [ping] ************************************************************************************************************
ok: [tuto_ansible]

PLAY RECAP *************************************************************************************************************
tuto_ansible               : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

If you need detailed output information (e.g. on debugging purposes), you can use the options -v, -vv, -vvv or -vvvv to progressively increase the verbosity of the ansible-playbook command. But be warned that when using more than one -v verbose flag you will generally get lots of textual output.

For example you can try following command:

ansible-playbook -i ./hosts playbook_1.yaml -v

PLAY [tuto] ************************************************************************************************************

TASK [ping] ************************************************************************************************************
ok: [tuto_ansible] => {
      "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
      },
      "changed": false,
      "ping": "pong"
    }

PLAY RECAP *************************************************************************************************************
tuto_ansible               : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

A little bit further with playbooks and tasks

Now let us go a little bit further and use the package module, that requires more parameters, in order to ensure that some packages are properly installed on your managed server(s).

Below is an example of such a playbook, stored in file playbook_2.yaml:

  # Select the group of server on which this playbook will be run
- hosts: tuto
  # Require ansible to gather facts i.e. basic information concerning the 
  # managed servers like their respective operating system, hardware 
  # resources, the installed version of python, the name of the package 
  # manager etc.
  gather_facts: true
  # Define the user name to be used for (ssh) connection to the servers
  remote_user: test_tuto
  # Since the package manager will be run on the servers, ansible must be
  # permitted to execute a command as the superuser on the server
  become: true
  # Specify the command allowing to "become superuser" on the servers (this
  # assumes that the above given `remote_user` is allowed to run the
  # become_method command).
  become_method: sudo
  # Describe a list of tasks to be applied on this group of servers
  tasks:
      # Name of first task
    - name: Ensure some packages are installed
      # Name of the module
      package:
        # Parameters of the module
        name:
          - vim
          - emacs
          - curl
          - wget
        state: present
      # Name of second task
    - name: Ensure some packages are not installed
      # Name of the module
      package:
        # Parameters of the module
        name:
          - upower
        state: absent

Let us describe the differences between this playbook (playbook_2.yaml) and the previous one (playbook_1.yaml):

gather_facts: true: the successful execution of playbook_1.yaml indicated us that the servers are (ssh) reachable. This playbook can thus get one step further and gather the servers associated basic facts. Note that when set to true this option can be omitted because this is its default value.
become: true: indicate to ansible that the execution of the playbook tasks requires privilege escalation.
become_method: sudo: indicate how ansible will escalate privilege.
tasks: the list has been updated to describe two tasks both using the package module. The first task ensures that some packages are installed. Conversely, the second task will ensure that the upower package is NOT installed. Note that if a package is listed in the parameters of the second task and was already (previously) installed then ansible will uninstall it.

You can run this playbook as we did previously, with the following command:

ansible-playbook -i ./hosts playbook_2.yaml

The output will of course depend on the initial state of (the packages) of the respective managed servers. For instance when running the playbook for the first time, your output will be like:

PLAY [tuto] *******************************************************************

TASK [Gathering Facts] ********************************************************
ok: [tuto_ansible]

TASK [Ensure some packages are installed] *************************************
changed: [tuto_ansible]

TASK [Ensure some packages are not installed] *********************************
ok: [tuto_ansible]

PLAY RECAP ********************************************************************
tuto_ansible               : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

But when running it for the second time, the output should be like:

PLAY [tuto] *******************************************************************

TASK [Gathering Facts] ********************************************************
ok: [tuto_ansible]

TASK [Ensure some packages are installed] *************************************
ok: [tuto_ansible]

TASK [Ensure some packages are not installed] *********************************
ok: [tuto_ansible]

PLAY RECAP ********************************************************************
tuto_ansible               : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Note that the resulting state of the Ensure some packages are installed task switched from changed to ok. This is due to the fact that when running the playbook for the second time ansible had nothing else to do (because the packages were installed at the first execution) which yielded the state of the server to be ok.

Tasks in roles

We now have some hints on how to use playbooks and modules. The next step concerns the execution of multiple tasks in order to provision a server into a prescribed state.

For instance if you want to setup a webserver, you will need to:

Install the required packages
Setup some database
Configure the web server
Etc.

Additionally if you wish that your servers to be "behind" a proxy, then you will further need to:

Refer to the proxy through some default environment variable
Setup the proxy in the package manager
Etc.

Or you may want to install a set of packages with predefined configurations. Then you may need to install:

Generic packages, such as vim, emacs, etc.
Specific packages version, such as python 2.6 or python 2.8

In ansible, such tasks as usually gathered into a role. For example the three above provided sets of tasks will be implemented as three roles : webserver, proxy and base_packages. Moreover, you may want to be able to change some parameters/values of such roles. For instance you know that some of your servers are behind proxy1 while others are behind proxy2. To tackle this situation roles enable you to use/define variables. The concrete values of such variables can then be defined later, for instance in some specific playbook(s).

This tutorial illustrates how to establish and test a basic role making usage of variables and enabling the setup of packages for specific OSes and it then shows how to later define such variables.

The aim of this tutorial is thus to get you acquainted with the usual definition of roles (that make usage of variables) and also to let you understand how to use roles that you will find in online libraries within your playbooks.

Role architecture

Ansible roles are usually placed in the "root" directory of your ansible project (the directory hierarchy) in a sub-directory named role. In the case of this tutorial, this is the directory where the hosts file (refer above) is located.

Let us first create the main role folder, that will hold the ansible roles locally defined, "next" to the hosts file.

# Create the role directory
mkdir roles

The next thing to know is the standard directory structure used for the definition of some ansible role that is well described in the Ansible Online Documentation - Roles (go ahead and browse this documentation).

In addition to these well known role sub-directories (tasks, handlers, library…) you might sometimes encounter sub-directories that are not part of ansible naming conventions. Such role sub-directories, that are simply ignored by ansible, allow the author of the role to store complementary information. This possible "decoration" of a role directory is used by the ansible roles that deploy a PAGoDA instance, where you will find sub-directories like:

docs: sub-directory holding the mkdocs based documentation of the role used to render this role documentation on this website.
molecule: sub-directory holding the molecule based test suite of the role.

The usage of such complementary sub-directories is mainly for developers and do not concern the usage of the role. They will thus not be described in this tutorial.

Create a really basic role

Let us now create a really basic role (at first without variables, handlers, templates, etc) defining only tasks. For instance, let us wrap up our previously mentioned package tasks into a role. We will call this role packages.

For that, in the roles folder (next to the file hosts), create a folder named packages which will hold the role definition. Then within this folder create a sub-folder named tasks and holding a file named main.yaml.

# Assuming you are in the folder where you have your `hosts` file
# Create the role `package` directory with the `tasks` sub-folder
mkdir -p roles/packages/tasks
# Create the file tasks/main
touch roles/packages/tasks/main.yaml

Now we will just need to place our previously defined tasks (refer to the above described playbook_2.yaml file) within this roles/packages/tasks/main.yaml file whose content will thus be:

# Name the task
- name: Ensure some packages are installed
  # Name of the module
  package:
    # Parameters of the module
    name:
      - vim
      - emacs
      - curl
      - wget
    state: present
  # Name the task
- name: Ensure some packages are not installed
  # Name of the module
  package:
    # Parameters of the module
    name:
      - upower
    state: absent

The definition of this basic role, holding two tasks ensuring the state installed packages, is now complete. Finally, in order to use this role, we need to adapt the previous playbook_2.yaml playbook to become:

  # Select the group of server on which this playbook will be run
- hosts: tuto
  # Require ansible to gather facts i.e. basic information concerning the 
  # managed servers like their respective operating system, hardware 
  # resources, the installed version of python, the name of the package 
  # manager etc.
  gather_facts: true
  # Define the user name to be used for (ssh) connection to the servers
  remote_user: test_tuto
  # Since the package manager will be run on the servers, ansible must be
  # permitted to execute a command as the superuser on the server
  become: true
  # Specify the command allowing to "become superuser" on the servers (this
  # assumes that the above given `remote_user` is allowed to run the
  # become_method command).
  become_method: sudo
  # Invoke the `packages` role that by default realizes the `main.yaml`
  # task as encountered in the `tasks` sub-directory of the role (that
  # is the `roles/packages/tasks/main.yaml` file)
  roles:
    - packages

We can now to use this redefined playbook as we did before by using the ansible-playbook command:

ansible-playbook -i ./hosts playbook_2.yaml

You should obtain an output that is similar to the one described in the previous tutorial.

Roles with variables

Let us now assume that we wish the usage of our role to be more flexible by not hard-wiring the two lists of packages. Instead we wish to allow a playbook to handle over those two list of packages when invoking the role.

Such an usage is possible through the usage of role variables. Let us first mention some conventions (that are borrowed from the Python language common practices) that are adopted by the ansible community of developers (as opposed to being enforced by the ansible engine). Such conventions concern the naming of role variables:

Variable names prefixed with _ (underscore) e.g. _foo: these variables are generally not intended for a role end user usage. Instead such variables store static information (constants) that are used in multiple locations (within the role) like the path to a configuration file. The default values of underscore prefixed variables are usually defined by the vars/main.yaml file within the role (sub-directory).
Variable names not prefixed with _ (but with a letter) e.g. foo: such variables are for end user usage. For instance, in order to let the user provide its own parameters to be written in a configuration file (or in the case of this tutorial example, in order to let the role user define its own list of packages). The default values of (role) user variables are most often defined by the defaults/main.yaml file within the role (sub-directory).

Another common conventional practice consists in prefixing the variable name with the name of the role. For instance for this tutorial examples concerning the packages role variable, names will start with packages or _packages. This convention enables to quickly identify to which role a given variable belongs to and also to avoid variable name collision issues. For instance:

A variable called config_path could be used to define a path to a nginx configuration or to another service
A variable named nginx_config_path is more explicit and indicates to the user that this variable is bound to the nginx role.

For this tutorial, we wish to inform the packages role user that he/she can provide his/her own list of packages. Let us adapt the tasks definition (within roles/packages/tasks/main.yaml file) in order to introduce the usage of variables :

# Name the task
- name: Ensure some packages are installed
  # Name of the module
  package:
    # Parameters of the module
    name: "{{ packages_installed }}"
    state: present
  # Name the task
- name: Ensure some packages are not installed
  # Name of the module
  package:
    # Parameters of the module
    name: "{{ packages_not_installed }}"
    state: absent

Notice that name entries were respectively adapted with :

"{{ packages_installed }}" in order to use the value of the packages_installed variable which will hold the list of package names that need to be installed
"{{ packages_not_installed }}" in order to use the value of the packages_not_installed variable which will hold the list of the package names that we do not want to be installed.

We can now propose default values for those two variables which, as above mentioned, are to be defined in the defaults/main.yaml file:

packages_installed:
  - vim
  - emacs
  - curl
  - wget
packages_not_installed:
  - upower

Yet, if you want to provide your own list of packages within your playbook (playbook_2.yaml file) then you ca adapt your playbook as illustrated below:

  # Select the group of server on which this playbook will be run
- hosts: tuto
  # Require ansible to gather facts i.e. basic information concerning the 
  # managed servers like their respective operating system, hardware 
  # resources, the installed version of python, the name of the package 
  # manager etc.
  gather_facts: true
  # Define the user name to be used for (ssh) connection to the servers
  remote_user: test_tuto
  # Since the package manager will be run on the servers, ansible must be
  # permitted to execute a command as the superuser on the server
  become: true
  # Specify the command allowing to "become superuser" on the servers (this
  # assumes that the above given `remote_user` is allowed to run the
  # become_method command).
  become_method: sudo
  # Invoke the `packages` role that by default realizes the `main.yaml`
  # task as encountered in the `tasks` sub-directory of the role (that
  # is the `roles/packages/tasks/main.yaml` file)
  roles:
    - packages
  # Define a (playbook level) custom lists of packages
  vars:
    packages_installed:
      - vim
      - curl
      - htop
      - tmux
    packages_not_installed:
      - emacs

REMARK: variable values defined by the user will overwrite values that are defined in the role. In the above example, wget (defined in defaults/main.yaml file) will not be installed (if not already previously installed).

As for the previous invocations, and in order to apply the changes, you need to use the ansible-playbook command:

ansible-playbook -i ./hosts playbook_2.yaml

You should obtain an output similar to those described in the previous tutorial.

Host and groups variables

You now know how to use playbooks and module's basic roles with variables. Let us proceed with some further ansible notations and way of doing things.

Assume you have two group of servers, labelled group_1 and group_2, and

you want to install specific packages only on servers in group_1 and other packages only on group_2 servers.
or you want to be sure that on a specific servers within group_1, some packages are not installed, how to do ?

How does one proceed to achieve such results ? The above situations can be respectively handled with the group_vars and host_vars.

Let us first mention that, in order to accommodate various types of ansible usages and types of practices, Ansible allows many different locations (files within an ansible project) for variables to be defined.

And depending on where the variables definition are located within an ansible project, variables get to be "attached" to (or in relation to) different ansible notions like an inventory, a specific host (host_vars), a set of (managed) servers (group_vars), a role, a playbook (refer to the Ansible Online Documentation - Variable precedence).

Notice that when the variable name is used in multiple locations (that is the variable is defined at different levels) then the definitions follows a precedence order as explained in that same Ansible Online Documentation - Variable precedence.

Then let us mention that, because of this possibility of multiplicity of levels for a single variable definition, the directory layout of a given Ansible project (i.e. the structure of the set of files constituting that Ansible project) is not only not unique but highly flexible. This plasticity gives rise to many Ansible "dialects" (driven by the project use cases) each of which adopts its particular directory structure and conventional localization for defining some type of variable.

Among this diversity, two main stream conventions for organizing the files of an ansible project emerged:

the classic/default directory layout where group_vars/host_vars are quite similar for different environments (e.g. production and staging)
the alternative directory layout where each inventory file is co-located with its associated group_vars/host_vars in a separate directory.

The PAGoDA project:

adopts the alternative directory layout, that clearly separates the inventory variables between different environments (at the cost of having more files to handle).
will only define what Ansible calls playbook groups_vars/host_vars as opposed to e.g. inventory file vars or inventory group_vars/host_vars (or some other type of variable as documented in understanding variable precedence The goal of this practice is to avoid/limit the possible confusions arising from precedence misunderstandings.

The typical directory structure of a PAGoDA instance (as proposed by the PAGoDA template) is thus of form:

├── hosts                     # Inventory file
├── group_vars                # Folder defining group vars
│   ├── group_1               # Folder to store variable for group group_1
│   │   └── vars.yaml         # Define variable for server in group_1
│   └── group_2               # Folder to store variable for group group_2
│       └── vars.yaml         # Define variable for server in group_2
├── host_vars                 # Folder defining host vars
│   ├── my_server_1           # Folder to store variable for host my_server_1
│   │   └── vars.yaml         # Define variable for server my_server_1
│   └── my_other_servers_2    # Folder to store variable for host my_other_servers_2
│       └── vars.yaml         # Define variable for server my_other_servers_2
├── roles                     # Folder which store roles
...

All PAGoDA instances, and the following tutorial also, will follow this "PAGoDA" convention where the definition of variables is thus done within the group_vars/ and host_vars/ sub-directories.

Eventually we can define group_vars as being the set of variables that are associated to a specific group of servers and host_vars as being the set of variables associated to a specific host.

Setup variables per server group

For this tutorial you will first need to update your inventory file in order to create two groups of managed servers that were above described in the introductory use case. For example you might define:

[group_1]
my_server_1 ansible_host=192.168.1.42
my_server_2 ansible_host=192.168.1.142
my_server_3 ansible_host=192.168.1.242

[group_2]
my_other_servers_1 ansible_host=192.168.100.42
my_other_servers_2 ansible_host=192.168.100.142

The above inventory defines the two groups of servers:

group_1, with three servers
group_2, with two servers

Before proceeding you should assert that the corresponding managed servers (the ones corresponding to the IP addresses provided in your hosts inventory file) were properly deployed on your ansible instance and that ansible can access them.

In the previous tutorial, we learned how to use variable in playbooks for the example role packages. Now let us assume we want some packages on servers of group group_1 but other packages servers of group group_2. This is done by filling corresponding files:

group_vars/group_1/vars.yaml for servers of group_1
group_vars/group_2/vars.yaml for servers of group_2

Below are examples of such files:

Content of group_vars/group_1/vars.yaml

packages_installed:
  - vim
  - tmux
packages_not_installed:
  - emacs
  - screen

Content of group_vars/group_2/vars.yaml

packages_installed:
  - emacs
  - screen
packages_not_installed:
  - vim
  - tmux

You may have noticed that there is no vars: key in these file. Indeed this key is implicit since the location of the files unambiguously indicates to ansible that the entries of these files are variables.

Now, how to use them ?

Contrary to previous tutorials, we will need to create two playbooks, one for group_1 and one for group_2. These playbooks will be very similar to each others and similar to previous playbooks. Yet the main differences will be between the entries of the hosts key. The content of such playbooks is given here:

playbook.group_1.yaml

  # Choose the group of server on which this playbook will be run
- hosts: group_1
  # As we know that our server are reachable, let us gather facts, i.e. let
  # ansible gather basic information from our servers such as operating system,
  # hardware resources, python version installed, etc.
  gather_facts: true
  # Define the user to use to connect to the server
  remote_user: test_tuto
  # As we will use the package module, we will need to allow ansible to become
  # root on our server, this assume that user test_tuto is able to use the
  # become_method
  become: true
  # Tell ansible how to become root
  become_method: sudo
  # Describe the lists of roles to uses
  roles:
    - packages
  # No need to describe variable here anymore as they are described in
  # `group_var/group_1/vars.yaml`
  # Any variable sets here will override variables in `group_var/group_1/vars.yaml`
  # see Variable precedence:
  # https://docs.ansible.com/ansible/2.9/user_guide/playbooks_variables.html#variable-precedence-where-should-i-put-a-variable

playbook.group_2.yaml

  # Choose the group of server on which this playbook will be run
- hosts: group_2
  # As we know that our server are reachable, let us gather facts, i.e. let
  # ansible gather basic information from our servers such as operating system,
  # hardware resources, python version installed, etc.
  gather_facts: true
  # Define the user to use to connect to the server
  remote_user: test_tuto
  # As we will use the package module, we will need to allow ansible to become
  # root on our server, this assume that user test_tuto is able to use the
  # become_method
  become: true
  # Tell ansible how to become root
  become_method: sudo
  # Describe the lists of roles to uses
  roles:
    - packages
  # No need to describe variable here anymore as they are described in
  # `group_var/group_2/vars.yaml`
  # Any variable sets here will override variables in `group_var/group_2/vars.yaml`
  # see Variable precedence:
  # https://docs.ansible.com/ansible/2.9/user_guide/playbooks_variables.html#variable-precedence-where-should-i-put-a-variable

Remark

As described in the comments of the above playbooks, you do not need to define variables in your playbook (files) anymore since variables are described per group in their respective folders.

Nevertheless if you happen to define (the same) variable in your playbook, then this variable (value) will override the one encountered in the per group file.

Now to apply your playbooks, you can run the following command:

# For group_1
ansible-playbook -i ./hosts playbook.group_1.yaml
# For group_2
ansible-playbook -i ./hosts playbook.group_2.yaml

Setup variables for a specific server

Now that we succinctly present you how to use group vars, let us see how to use hosts vars (you may have already guess it ).

So let us assume we want some specific packages on a specific servers, for instance the server my_other_servers_1. This is done by filling corresponding files:

host_vars/my_other_servers_1/vars.yaml

Below is an examples of such file:

Content of host_vars/my_other_servers_1/vars.yaml

packages_installed:
  - vim
  - tmux
  - emacs
  - screen
  - htop
packages_not_installed: []

As for group_vars, you may have noticed that there is no vars: key present in this file (since the location of the file makes it implicit).

Now, how to use host_vars ?

For now we will continue the same way we did for group_vars and we will create a playbook for this specific server.. Below is the content of such playbooks:

playbook.my_other_servers_1.yaml

  # Choose the group of server on which this playbook will be run
- hosts: my_other_servers_1
  # As we know that our server are reachable, let us gather facts, i.e. let
  # ansible gather basic information from our servers such as operating system,
  # hardware resources, python version installed, etc.
  gather_facts: true
  # Define the user to use to connect to the server
  remote_user: test_tuto
  # As we will use the package module, we will need to allow ansible to become
  # root on our server, this assume that user test_tuto is able to use the
  # become_method
  become: true
  # Tell ansible how to become root
  become_method: sudo
  # Describe the lists of roles to uses
  roles:
    - packages
  # No need to describe variable here anymore as they are described in
  # `group_var/group_1/vars.yaml`
  # Any variable sets here will override variables in `group_var/group_1/vars.yaml`
  # see Variable precedence:
  # https://docs.ansible.com/ansible/2.9/user_guide/playbooks_variables.html#variable-precedence-where-should-i-put-a-variable

Remark

As described in comments of the above playbook, you do not need to define variables in that playbook (file) anymore since variables are now defined per host (in their respective host sub-folder).

Nevertheless if you happen to define (the same) variable in your playbook, then this variable (value) will override the one encountered in the per host file.

Now in order to apply your playbook, you can run the following command:

ansible-playbook -i ./hosts playbook.my_other_servers_1.yaml

Mixed usage of host and group vars

Now that a host_vars was defined for the specific my_other_servers_1 host (with the help of the host_vars/my_other_servers_1/vars.yaml file) what would happen if were to call the playbook.group_2.yaml again ? Since my_other_servers_1 is part of group_2, will Ansible load the host_vars of my_other_servers_1 although ansible will be invoked with group_2 as target ? And if the host_vars/my_other_servers_1/vars.yaml were to be loaded then what would be the taken values of the my_other_servers_1 variables : the values given for the whole group_2 or the specific values given for my_other_servers_1 ?

The answer is that ansible loads all the files (it encounters within a project) and uses a specific order to do so, as described in Ansible Online Documentation - Variable precedence.

Since the variables defined in group_vars/group_2/vars.yaml will be overwritten by the ones encountered in host_vars/my_other_servers_1/vars.yml, then when calling playbook.group_2.yaml (with the ansible-playbook -i ./hosts playbook.group_2.yaml command) then for this tutorial example :

my_other_servers_1 will have the packages described in host_vars/my_other_servers_1/vars.yaml whereas
my_other_servers_2 will have the packages described in `group_var/group_2/vars.yaml

The special group all

You may have noticed that they are very few differences between the above playbooks (say between playbook.group_1.yaml and playbook.my_other_servers_1.yaml) since only the "hosts" entry changes. Having to create such playbooks for each host or group will quickly prove to be tedious task.

Now that you know about the mixed usage of host_vars and group_vars, we can introduce a special group of servers: the all group .

The all group is one of the default ansible group. This group is defined by collecting every server encountered in the inventory and can be used as any other group. In particular you can define variables that will be valid for all servers at once by placing them in the group_vars/all/vars.yaml file. The exception is that if (the host of) a group or a specific host also has host_vars and/or group_vars, then the latter variables values will overwrite the values defined in group_vars/all/vars.yaml.

So in our example, provisioning all servers at once (i.e. installing selected packages via role packages) can be easily done with the following single playbook:

playbook.all.yaml

  # Choose the group of server on which this playbook will be run
- hosts: all
  # As we know that our server are reachable, let us gather facts, i.e. let
  # ansible gather basic information from our servers such as operating system,
  # hardware resources, python version installed, etc.
  gather_facts: true
  # Define the user to use to connect to the server
  remote_user: test_tuto
  # As we will use the package module, we will need to allow ansible to become
  # root on our server, this assume that user test_tuto is able to use the
  # become_method
  become: true
  # Tell ansible how to become root
  become_method: sudo
  # Describe the lists of roles to uses
  roles:
    - packages
  # No need to describe variable here anymore as they are described in
  # `group_var/group_1/vars.yaml`
  # Any variable sets here will override variables in `group_var/group_1/vars.yaml`
  # see Variable precedence:
  # https://docs.ansible.com/ansible/2.9/user_guide/playbooks_variables.html#variable-precedence-where-should-i-put-a-variable

Then, we will just need to run the following command:

ansible-playbook -i ./hosts playbook.all.yaml

In our example, this will install packages such that :

my_servers_1, my_servers_1 and my_servers_1 will have packages described in group_vars/group_1/vars.yaml
my_other_servers_1 will have packages described in host_vars/my_other_servers_1/vars.yaml
my_other_servers_2 will have packages described in `group_var/group_2/vars.yaml

Going further

Now you have basic knowledges about ansible, we strongly recommend you to practice. To do so, do not hesitate to create/delete temporary VMs to avoid messing your own computer or your production environment.

Moreover, you should practice with roles used to deploy a PAGoDA instance which are succinctly described from Ansible Roles.

Despite ansible being able to communicate through SSH without the use of SSH key, this will not be addressed/used in these tutorials. Please refer to Ansible Official Documentation. ↩

Last update: May 20, 2021