Taming the cloud: Provisioning with Terraform

2
8867

Terraform is open source software that enables sysadmins and developers to write, plan and create infrastructure as code. It is a no-frills software package, which is very simple to set up. It uses a simple configuration language or JSON, if you wish.

Terraform is a tool to create and manage infrastructure that works with various IaaS, PaaS and SaaS service providers. It is very simple to set up and use, as there aren’t multiple packages, agents and servers, etc, involved. You just declare your infrastructure in a single (or multiple) file using a simple configuration language (or JSON), and that’s it. Terraform takes your configurations, evaluates the various building blocks from those to create a dependency graph, and presents you a plan to create the infrastructure. When you are satisfied with the creation plan, you apply the configurations and Terraform creates independent resources in parallel. Once some infrastructure is created using Terraform, it compares the current state of the infrastructure with the declared configurations on subsequent runs, and only acts upon the changed part of the infrastructure. Essentially, it is a CRUD (Create Read Update Destroy) tool and acts on the infrastructure in an idempotent manner.

Installation and set-up

Terraform is created in Golang, and is provided as a static binary without any install dependencies. You just pick the correct binary (for GNU/Linux, Mac OS X, Windows, FreeBSD, OpenBSD and Solaris) from its download site, unzip it anywhere in your executable’s search path and all is ready to run. The following script could be used to download, unzip and verify the set-up on your GNU/Linux or Mac OS X nodes:

HCTLSLOC=’/usr/local/bin’

HCTLSURL=’https://releases.hashicorp.com’

# use latest version shown on https://www.terraform.io/downloads.html

TRFRMVER=’x.y.z’

if uname -v | grep -i darwin 2>&1 > /dev/null

then

OS=’darwin’

else

OS=’linux’

fi

wget -P /tmp --tries=5 -q -L “${HCTLSURL}/terraform/${TRFRMVER}/terraform_${TRFRMVER}_${OS}_amd64.zip”

sudo unzip -o “/tmp/terraform_${TRFRMVER}_${OS}_amd64.zip” -d “${HCTLSLOC}”

rm -fv “/tmp/terraform_${TRFRMVER}_${OS}_amd64.zip”

terraform version

Concepts that you need to know

You only need to know a few concepts to start using Terraform quickly to create the infrastructure you desire. ‘Providers’ are some of the building blocks in Terraform which abstract different cloud services and back-ends to actually CRUD various resources. Terraform gives you different providers to target different service providers and back-ends, e.g., AWS, Google Cloud, Digital Ocean, Docker and a lot of others. You need to provide different attributes applicable to the targeted service/ back-end like the access/secret keys, regions, endpoints, etc, to enable Terraform to create and manage various cloud/back-end resources. Different providers offer various resources which correspond to different building blocks, e.g., VMs, storage, networking, managed services, etc. So only a single provider is required to make use of all the resources implemented in Terraform, to create and manage infrastructure for a service or back-end. There are ‘provisioners’ that correspond to different resources to initialise and configure those resources after their creation. The provisioners mainly do tasks like uploading files, executing remote/local commands/scripts, running configuration management clients, etc.

You need to describe your infrastructure using a simple configuration language in single or multiple files, all with the .tf extension. The configuration model of Terraform is declarative, and it mainly merges all the .tf files in its working directory at runtime. It resolves the dependencies between various resources by itself to create the correct final dependency graph, to bring up independent resources in parallel. Terraform could use JSON as well for its configuration language, but that works better when Terraform configurations are generated by automated tools. The Terraform format is more human-readable and supports comments, so you could mix and match .tf and .json configuration files in case some things are human coded and others are tool generated. Terraform also provides the concepts of variables, and functions working on those variables, to store, assign and transform various things at runtime.

The general workflow of Terraform consists of two stages —to plan and apply. The plan stage evaluates the merged (or overridden) configs, and presents a plan before the operator about which resources are going to get created, modified and deleted. So the changes required to create your desired infrastructure are pretty clear at the plan stage itself and there are no surprises at runtime. Once you are satisfied with the plan generated, the apply stage initiates the sequence to create the resources required to build your declared infrastructure. Terraform keeps a record of the created infra in a state file (default, terraform.tfstate) and on every further plan-and-apply cycle, it compares the current state of the infra at runtime with the cached state. After the comparison of states, it only shows or applies the difference required to bring the infrastructure to the desired state as per its configuration. In this way, it creates/maintains the whole infra in an idempotent manner at every apply stage. You could mark various resources manually to get updated in the next apply phase using the taint operation. You could also clean up the infra created, partially or fully, with the destroy operation.

Working examples and usage

Our first example is to clarify the syntax for various sections in Terraform configuration files. Download the code example1.tf from https://www.opensourceforu.com/article_source_code/sept17/terraform.zip. The code is a template to bring up multiple instances of AWS EC2 VMs with Ubuntu 14.04 LTS and an encrypted EBS data volume, in a specified VPC subnet, etc. The template also does remote provisioning on the instance(s) brought up by transferring a provisioning script and doing some remote execution.

Now, let’s dissect this example, line by line, in order to practically explore the Terraform concepts. The lines starting with the keyword variable are starting the blocks of input variables to store values. The variable blocks allow the assigning of some initial values used as default or no values at all. In case of no default values, Terraform will prompt for the values at runtime, if these values are not set using the option -var ‘<variable>=<value>’. So, in our example, sensitive data like AWS access/private keys are not being put in the template as it is advisable to supply these at runtime, manually or through the command options or through environment variables. The environment variables should be in the form of TF_VAR_name to let Terraform read it. The variables could hold string, list and map types of values, e.g., storing a map of different amis and subnets for different AWS regions as demonstrated in our example. The string value is contained in double quotes, lists in square brackets and maps in curly braces. The variables are referenced, and their values are extracted through interpolation at different places using the syntax ${var.<variable name>}. You could explore everything about Terraform variables on the official variables help page.

It’s easy to guess that the block starting with the keyword provider is declaring and supplying the arguments for the service/back-end. The different providers take different arguments based upon the service/back-end being used and you could explore those in detail on the official providers page. The resource keyword contains the main meat in any Terraform configuration. We are using two AWS building blocks in our example: aws_instance to bring up instances and aws_route53_record to create cname records for the instances created. Every resource block takes up some arguments to customise the resource(s) it creates and exposes some attributes of the resource(s) created. Each resource block starts with resource <resource type> <resource id>, and the important thing is that the <resource type> <resource id> combination should be unique in the same Terraform configuration scope. The prefix of each resource is linked to its provider, e.g., all the AWS prefix resources require an AWS provider. The simple form of accessing the attribute of a resource is <resource type>.<id>.<attribute>. Our example shows that the public_ip and public_dns attributes of the created instances are being accessed in route53 and output blocks.

Some of the resources require a few post-creation actions like connecting and running local and/or remote commands, scripts, etc, on AWS instance(s). The connection block is declared to connect to that resource, e.g., by creating a ssh connection to the created instances in our example. The provisioner blocks are the mechanisms to use the connection to upload file(s) and the directory to the created resource(s). The provisioners also run local or remote commands, while Chef runs concurrently. You could explore those aspects in detail on the official provisioners help page. Our example is uploading a provisioning script and kicking that off remotely over ssh to provision the created instances out-of-the-box. Terraform provides some meta-parameters available to all the resources, like the count argument in our example. The count.index keeps track of the current resource being created to reference that now or later, e.g., we are creating a unique name tag for each instance created, in our example. Terraform deducts the proper dependencies as we are referencing the attribute of aws_instance in aws_route53_record; so it creates the instances before creating their cname records. You could use meta-variable depends_on in cases where there is no implicit dependency between resources and you want to ensure that explicitly. The above-mentioned variables help the page provide detailed information about the meta-variables too.

The last block declared in our example configuration is the output block. As is evident by the name itself, the output could dump the raw or transformed attributes of the resources created, on demand, at any time. You can also see the usage of various functions like the format and the element in the example configuration. These functions transform the variables into other useful forms, e.g., the element function is retrieving the correct public_ip based upon the current index of the instances created. The official interpolation help page provides detailed information about the various functions provided by Terraform.

Now let’s look at how to decipher the output being dumped when we invoke different phases of the Terraform workflow. We’ll observe the following kind of output if we execute the command terraform plan -var ‘num_nds=”3”’ after exporting the TF_VAR_aws_access_key and TF_VAR_aws_access_key, in the working directory where the first example config was created:

+ aws_instance.test.0

...

+ aws_instance.test.1

...

+ aws_instance.test.2

...

+ aws_route53_record.test.0

...

+ aws_route53_record.test.1

...

+ aws_route53_record.test.2

Plan: 6 to add, 0 to change, 0 to destroy.

If there is some error in the configuration, then that will come up in the plan phase only and Terraform dumps the parsing errors. You can explicitly verify the configuration for any issue using the terraform validate command. If all is good, then the plan phase dumps the resources it’s going to create (indicated by the + sign before the resources’ names, in green colour) to converge to the declared model of the infrastructure. Similarly, the Terraform plan output represents the resources it’s going to delete in red (indicated by the – sign) and the resources it will update in yellow (indicated by the ~ sign). Once you are satisfied with the plan of resources creation, you can run terraform apply to apply the plan and actually start creating the infrastructure.

Our second example is to get you more comfortable with Terraform, and use its advanced features to create and orchestrate some non-trivial scenarios. The code example2.tf can be downloaded from https://www.opensourceforu.com/article_source_code/sept17/terraform.zip. It actually automates the task of bringing up a working cluster out-of-the-box. It brings up a configurable number of multi-disk instances from the cluster payload AMI, and then initiates a specific order of remote provisioners using null_resource, some provisioners on all the nodes and some only on a specific one, respectively.

In the example2.tf template, multiple null_resource are triggered in response to the various resources created, on which they depend. In this way, you can see how easily we can orchestrate some not-so-trivial scenarios. You can also see the usage of depends_on meta-variable to ensure a dependency sequence between various resources. Similarly, you can mark those resources created by Terraform that you want to destroy or those resources that you wish to create afresh using the commands terraform destroy and terraform taint, respectively. The easy way to get quick information about the Terraform commands and their options/arguments is by typing terraform and terraform <command name> -h.

The recent versions of Terraform have started to provide data sources, which are the resources to gather dynamic information from the various providers. The dynamic information gathered through the data sources is used in the Terraform configurations, most commonly using interpolation. A simple example of a data source is to gather the ami id for the latest version of an ami and use that in the instance provisioning configurations as shown below:

data “aws_ami” “myami” {

most_recent = true

filter {

name = “name”

values = [“MyBaseImage”]

}

}

resource “aws_instance” “myvm” {

ami = “${data.aws_ami.myami.id}

…

}

Code organisation and reusability

Although our examples show the entire declarative configuration in a single file, we should break it into more than one file. You could break your whole config into various separate configs based upon the respective functionality they provide. So our first example could be broken into variables.tf that keeps all the variables blocks, aws.tf that declares our provider, instances.tf that declares the layout of the AWS VMs, route53.tf that declares the aws route 53 functionality, and output.tf for our outputs. To keep things simple, use and maintain, keep everything related to a whole task being solved by Terraform in a single directory along with sub-directories that are named as files, scripts, keys, etc. Terraform doesn’t enforce any hierarchy of code organisation, but keeping each high level functionality in its dedicated directory will save you from unexpected Terraform actions in spite of unrelated configuration changes. Remember, in the software world, “A little copying is better than a little dependency,” as things get fragile and complicated easily with each added functionality.

Terraform provides the functionality of creating modules to reuse the configs created. The cluster creation template shown above is actually put in a module to use the same code to provision test and/or production clusters. The usage of the module is simply supplying the required variables to it in the manner shown below (after running terraform get to create the necessary link for the module code):

module “myvms” {

source = “../modules/awsvms”

ami_id = “${var.ami_id}”

inst_type = “${var.inst_type}”

key_name = “${var.key_name}”

subnet_id = “${var.subnet_id}”

sg_id = “${var.sg_id}”

num_nds = “${var.num_nds}”

hst_env = “${var.hst_env}”

apps_pckd = “${var.apps_pckd}”

hst_rle = “${var.hst_rle}”

root_size = “${var.root_size}”

swap_size = “${var.swap_size}”

vol_size = “${var.vol_size}”

zone_id = “${var.zone_id}”

prov_scrpt= “${var.prov_scrpt}”

sub_dmn = “${var.sub_dmn}”

}

You also need to create a variables.tf in the location of your module source, requiring the same variables you fill in your module. Here is the module variables.tf to pass the variables supplied from the caller of the module:

variable “ami_id” {}

variable “inst_type” {}

variable “key_name” {}

variable “subnet_id” {}

variable “sg_id” {}

variable “num_nds” {}

variable “hst_env” {}

variable “apps_pckd” {}

variable “hst_rle” {}

variable “root_size” {}

variable “swap_size” {}

variable “vol_size” {}

variable “zone_id” {}

variable “prov_scrpt” {}

variable “sub_dmn” {}

The Terraform official documentation consists of a few detailed sections for modules usage and creation, which should provide you more information on everything related to modules.

Importing existing resources

As we have seen earlier, Terraform caches the properties of the resources it creates into a state file, and by default doesn’t know about the resources not created through it. But recent versions of Terraform have introduced a feature to import existing resources not created through Terraform into its state file. Currently, the import feature only updates the state file, but the user needs to create the configuration for the imported resources. Otherwise, Terraform will show the imported resources with no configuration and mark those for destruction.

Let’s make this clear by importing an AWS instance, which wasn’t brought up through Terraform, into some Terraform-created infrastructure. You need to run the command terraform import aws_instances.<Terraform Resource Name> <id of the instance> in the directory where a Terraform state file is located. After the successful import, Terraform gathers information about the instance and adds a corresponding section in the state file. If you see the Terraform plan now, it’ll show something like what follows:

- aws_instance .<Terraform Resource Name>

So it means that now you need to create a corresponding configuration in an existing or new .tf file. In our example, the following Terraform section should be enough to not let Terraform destroy the imported resource.

resource “aws_instance” “<Terraform Resource Name>” {

ami = “<AMI>”

instance_type = “<Sizing info>”

tags {

...

}

}

Please note that you only need to mention the Terraform resource attributes that are required as per the Terraform document. Now, if you see the Terraform plan, the earlier shown destruction plan goes away for the imported resource. You could use the following command to extract the attributes of the imported resource to create its configuration:

sed -n ‘/aws_instance.<Terraform Resource Name>/,/}/p’ terraform.tfstate | \

grep -E ‘ami|instance_type|tags’ | grep -v ‘%’ | sed ‘s/^ *//’ | sed ‘s/:/ =/’

Please pay attention when you import a resource into your current Terraform state and decide not to use that going forward. In which case, don’t forget to rename your terraform.state.backup as terraform.state file to roll back to the previous state. You could also delete that resource block from your state file, as an alternative, but it’s not a recommended approach. Otherwise, Terraform will try to delete the imported but not desired resource and that could be catastrophic in some cases.

The official Terraform documentation provides clear examples to import the various resources into an existing Terraform infrastructure. But if you are looking to include the existing AWS resources in the AWS infra created by Terraform in a more automated way, then take a look at the Terraforming tool link in the References section.

Note: Terraform providers are no longer distributed as part of the main Terraform distribution. Instead, they are installed automatically as part of running terraform init. The import command requires that imported resources be specified in the configuration file. Please see terraform changelog https://github.com/hashicorp/terraform/blob/v0.10.0/CHANGELOG.md for these.

Missing bytes

You should now be feeling comfortable about starting to automate the provisioning of your cloud infrastructure. To be frank, Terraform is so feature-rich now that it can’t be fully covered in a single or multiple articles and deserves a dedicated book (which has already shaped up in the form of an ebook, ‘Terraform Up & Running’). So you could further take a look at the examples provided in its official Git repo. Also, the References section offers a few pointers to some excellent reads to make you more comfortable and confident with this excellent cloud provisioning tool.

Creating on-demand and scalable infrastructure in the cloud is not very difficult if some very simple basic principles are adopted and implemented using some feature-rich but no-fuss, easy-to-use standalone tools. Terraform is an indispensable tool for creating and managing cloud infrastructure in an idempotent way across a number of cloud providers. It could further be glued together with some other management pieces to create an immutable infrastructure workflow that can tame any kind of modern cloud infrastructure. The ‘Terraform Up and Running’ ebook is already out in the form of a print book.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here