HACKER Q&A
📣 impoppy

Why was Terraform created?


I get the idea and general use cases of Terraform - to have a unified interface with a single language for all kinds of cloud infrastructures where the machine you are deploying to can be virtually anything.

But in majority of the cases developers are very much aware of environments their code run on - they know that their containers are stored in ECR, ran in ECS, their data is stored in S3 and RDS.

It is trivial to build a container, upload it to ECR and then deploy it to ECS from a shell script. And it is a lot more readable and comprehensible for a person not familiar with the tool.

Maybe I am the problem and I just don't get the declarative style, where you only describe the wanted state, not the steps you take to achieve that state?

If we assumed by default that our cloud infrastructure provider is AWS, wouldn't it be simple to write a shell script that would call `aws-cli` few times?

I came to that question, when discussing a problem my friend, a DevOps engineer, was having - he wasn't able to get the Azure Resource Group from his `foobar.tf` files and he ended up with something like the following:

    cat << EOF
      {
        "aksvnet": "$(az resource list --resource-group $1 | grep -i \"aks-vnet- | cut -d ":" -f2 | tr -d '", ')"
      }
    EOF
And what is it? It is a shell script inside of a JSON that was created in the shell! What for are these layers of abstraction? Why does he have to wrap the Resource Group name in a JSON? Why couldn't it be just piped in plaintext format, as all the tools that try to be POSIX-compatible do?

UPD: This question is for environments where all the engineers are (to some extent) familiar with %CloudProviderSDK% and bash. And, in my opinion, it's a lot easier to pick up bash and %CloudProviderSDK%, as those are imperative therefore closer to engineers' daily routine, as opposed to Terraform's declarative style. Shell scripts, in my opinion, are just more intuitive by default.


  👤 awsanswers Accepted Answer ✓
Go down the path of rolling your own shell scripts and if you do a world class job you'll end up reimplementing the functionality and architecture of Terraform. Also, as a commercial open source project, you have the benefit of an entire community using and extending the tool in their own use cases. Read through Terraform release notes, you will see the amount of work & experimentation writing and maintaining a provider can be.

👤 morsecodist
> to have a unified interface with a single language for all kinds of cloud infrastructures where the machine you are deploying to can be virtually anything

I don't think having a unified interface is the motivation behind Terraform. You still need to understand the underlying resources you are dealing with, Terraform doesn't abstract that at all. The big idea behind Terraform is procedural vs declarative. You can write scripts to bring up all of your infrastructure but what if one of your scripts fails in the middle? What parts of it actually went into effect and which didn't? Can you just re-run it or will the first part now fail because the infrastructure already exists? What if you have several engineers working on the same environment applying scripts that may interfere with one another? What if there was an incident and you made some manual changes and now production is out of sync with what is represented in the script? What if you made some complicated infrastructure changes and you broke something and want to bring everything back to exactly how it was before?

Declarative infrastructure answers all of those questions. It lets you keep track of what the current state of your infrastructure is, and what you want it to be. It automatically identifies areas where the two don't match up and serves as a forcing function for documenting changes to your infrastructure. Declarative infrastructure is more complicated than procedural because bringing up infrastructure is a procedural process so you need a tool to make it into a declarative one and that is not always easy. But if your team's needs get complex enough the tradeoff is well worth it. I honestly can't even imagine life without it.

As a bonus it makes it easy to ship complete infrastructure solutions as re-usable modules that you can compose.


👤 jasonhansel
It's probably overkill for your use case. If you're on AWS you can stick to CloudFormation; with something this simple, you can indeed just (as you suggest) use a bash script.

But a lot of applications have infrastructure far, far more complex than a single service running in a container and S3/RDS. It may involve a large number of lambdas, networks, API gateways, firewalls, proxies, certificates, etc.

Past a certain point, you need a way of managing all that complexity, keeping things consistent across environments/regions, ensuring all infrastructure changes are tracked and audited, and making it easier to update lots of resources at once, among other things. That's where Terraform helps.


👤 awithrow
> But in majority of the cases developers are very much aware of environments their code run on - they know that their containers are stored in ECR, ran in ECS, their data is stored in S3 and RDS.

This has absolutely not been my experience. I've worked with a few devs who might be curious to know how everything worked. Most devs I've worked with focus solely on the code they write.

I've also inherited many systems over the years and I'd take the ones managed with tf over bash every single time.

A non exhaustive list of what tf helps with.

1. Being able to know what has changed and what needs to change before you run

2. Managing infra outside of the large cloud providers and being able to combine the two

3. Quickly being able to add a new environment or region to an existing cluster

4. Some requirement has changed and some new policy/tool needs to be stitched in across all your environments


👤 cybrexalpha
The complexity of cloud deployments tends to grow wildly over time. What starts as an ECR repo with a single ECS deployment turns into Route 53 zones, S3 buckets, ELBs, multiple deployments of ECS, security policies, the odd EC2 instance (there's always one somehwere), etc, etc.

Terraform gives you a commmon language to make sense of it all that can grow as your cloud infra does.

When combined with git and CI/CD it's also an amazing self-service experince. For example you can put the Terraform code that describes your environment in a git repo, and allow any employee to open pull requests, and deploy changes on merge automaically, and require IT approval to merge. Now any engineer can self-service request access to a prod environment (by modifying IAM in Terraform), or configure a production deployment without ever needing actual access to prod. IT gets an audit log, they get a control gate (the code review), and engineers get to self-service changes which reduces the load on IT.


👤 cs_tiger4
One killer feature of Infrastructure as Code, be it terraform or any other is idempotence.

You have to be careful not to run your bash script twice or you get another instance/vpc/loadbalancer or whatever.

You run "terraform apply" twice and it does nothing on the second run.

If you start implementing that in your shell scripts you start implementing terraform in bash.


👤 arrmn
To get data in Terraform you have outputs and you can display the data.

Terraform helps you to have a unified way to manage your resources, sure the bash scripts works for you, but what happens if you leave the company? Somebody else has to maintain your shell script.

What happens if somebody else is changing the infrastructure and they're not familiar with your shell script, they need time to dig in to figure things out and then update it, and in best case test it.

And you need to keep your scripts up to date, you need to build in fault tolerance, you need to think how you're going to deploy new resources. How are you going to handle destroying resources?

And on top of that you also need to learn the cloud Provider CLI tools or API to know what kind of calls to execute.

It just provides a standardised way to manage your infra.


👤 ggeorgovassilis
I can't answer "why terraform was created" because I wasn't there. Also I only recently started working with it, so by far no expert on the matter. People who don't know terraform often praise it for its platform-agnostic language, but there's an important caveat: the language is platform agnostic, not the templates one writes as they are riddled with platform-specific nouns. You can't take an Azure template and deploy it on AWS.

Having said that, I (so far) like terraform for the same reason you noted: it's more readable and there is great tooling around it. I like state management and the ability to invoke lower-level components (as the shell breakout in your example) when you really have to.

edit: > the declarative style, where you only describe the wanted state, not the steps you take to achieve that state

That is a good and useful thing. It's called "desired configuration management", Ansible works the same way. When the underlying tool works well, it decides on its own how to implement what you want. If you ever watch terraform deploy a complex (10+ dissimilar resources) infrastructure it comes close to magic how it discovers what already has been done and what still needs to be done and in which order.


👤 Someone
One thing that terraform does (see below) is that it diffs the desired state with current state (again: see below), and then makes the necessary changes.

So, if you have terraformed a load balancer balancing load between 2 machines, and change your terraform to declare a load balancer balancing load between 3 machines, it won’t destroy two machines, destroy the load balancer, create a new load balancer, and then create 3 machines.

Instead, it will create a new machine and change the load balancer to know about it, so that your service is uninterrupted.

Problem is that the above isn’t quite true.

Firstly, comparing with the current state is slow, so terraform has a cache of what it thinks the current state is. If they get out of sync, things can get interesting.

Secondly, all changes are done by plugins of varying quality. Your cloud provider may, for example, support reconfiguration of a load balancer, but if the plug-in doesn’t, terraform will destroy and create a new one.


👤 hussainbilal
Not an expert nor a user of terraform, but I'm currently learning about the tool, and I like to learn by reading from books and docs, rather than diving straight in and doing.

Based on what I've read, while Hashicorp tools may look like their only contribution is platform agnostic tooling, a deep dive into the docs reveals a focus on dynamically changing architecture and providing tooling to scale dynamically changing architectures in short time scales to any number of resources (i.e. not just machines / VM / containers / compute resources, but resources like users, user-generated resources, user-generated secrets etc.)

My impressions thus far is that Hashicorp is aware of the variety of alternative tools, that's why their certifications / training / professional services are only available for the tools truly core to supporting dynamic architectures : Terraform, Consul and Vault.

https://www.hashicorp.com/customer-success/professional-serv...

https://www.hashicorp.com/customer-success/enterprise-academ...

https://www.hashicorp.com/certification


👤 dimitar
The example you gave is of a wrong way to write Terraform; you should be able to query the cloud API using using data sources. In your case (getting a VNet) that would be this: https://registry.terraform.io/providers/hashicorp/azurerm/la...

In case you need to get the metadata of a resource group you can use this: https://registry.terraform.io/providers/hashicorp/azurerm/la...

I am a very happy Terraform user, here are the benefits for me:

* Very simple workflow that helps prevent unintended consequences - first you write your code, generate a plan, inspect it carefully and only then apply. It is easy to work in a team setting where you can have one person write modules and others supply variables to them.

* I personally don't want to burden myself with Azure Resource Manager, CloudFormation or any other vendor specific IAC tool.

* I don't like other people's bash; there are tools like shell check, but usually a larger infra codebase becomes an awful ad-hoc mess of ENV variables and clever hacks. And infrastructure code is nasty to test and refactor.

Try to keep it simple as possible; anytime you are fighting Terraform it usually means there is a much simpler way to do it. And if there is inherent complexity it could be the wrong thing to do.

In case you need very dynamic behaviour (basically a part of an application) I advise the following - put in terraform the things that are not likely to change often or where the cost of breakage is higher - your virtual networks, DNS configuration, Load Balancers, VPNs, Autoscaling groups, important alerts, etc. Manage more ephemeral workloads in a more general purpose language if there is no straightforward way to do it in the official APIs. I am also very happy user of the AWS CLI in some cases + the cognitect aws libraries for Clojure. However if you need to do something very dynamic it is also likely to be wrong.


👤 rgoulter
Bash is ~okay for small and simple enough tasks. But, above a certain complexity, I think probably anything else is going to be easier to maintain & have fewer footguns.

👤 thomasmcfarlane
It's much more about standardising the approach within a team, and not needing to know your target infrastructure state. Especially at the stage of a scale up, you'll get a lot of differing views as where to write your scripts - Terraform provides tooling for a specific use case where you don't need to reinvent everything.

At Nimbus[1], we have been trialling using Terraform for template definitions as users are mostly familiar with it, and it allows them to integrate more easily with existing CI/CD processes. They can easily just add a new Nimbus Workspace to their Terraform and have it spin up a new development environment when their CI requires it.

[1]: https://usenimbus.com - Easy remote development infra for teams


👤 Bleloto
Just because you clicked in some webui to do something doesn't mean it's save.

You still need to write how to do it, how to bootstrap it and why you do things.

We have a basic tf layer, which does make it well documented, easily extendable and repeatable.

Yes we do destroy the whole setup and recreate it. Often no but still.

After the tf layer, there is only k8s which is also 100% IaC.

Also sry to say but we are experts, learning something like tf should not be a big hurdle.

What I saw in old sysadmin setups: tons of snow flake VMS no one knows why they exist, random setups different security versions on it.

If you don't have any tool to automate things you will not do it.

Feel free to create a small infra setup manually if you prefer, I prefer to codify it once and can recreate it instead of documenting it in some word doc.

Tf is not perfect btw.


👤 wereallterrrist
"Random shell script is more readable than terraform". Yeah, no. Not if you mean bash, and certainly frickin not if theres a modicum of reliability/retries/recreaet logic in them. I dont think you have a good view of the landscape.

👤 warrenm
Terraform was made for exactly the reason you stated ("single language for all kinds of cloud infrastructures")

It was also made for non-developers to be able to deploy what someone else built "anywhere"



👤 ukoki
Terraform isn't for deploying infrastructure, it's for _converging_ infrastructure onto a desired state. Good luck writing a bash script that can deploy hundreds of different, dependent IaaS resources and deal with any or all of the resources initially being misconfigured or missing.

👤 typedef_struct
Terraform creates a virtual infrastructure against which diffs can be performed.

👤 deathanatos
> [why not write shell scripts to do what we need]

The shell script needs to determine, for each resource, whether it exists; if it does exist, what changes to make and how to translate those into API calls, or if it doesn't exist, how to create it, and to clean up any resources no longer in the desired state.

Attributes of some resource that might exist only after creation need to be fed into other resources…

For even a single resource, over the lifetime of the many changes and adjustments to the resource, that is extremely complicated to do correctly in shell alone.

The declarative "desired state" style is more useful since the steps required to be undertaken often depend on the state of the infrastructure that exists, or doesn't exist.

(additionally, you'll also need to notate state about what infra exists and what doesn't, and store that somewhere, and transmit that state to coworkers … and TF handles that, too. While "its obvious" for some infra — i.e., the resource has a natural key — not all resources do, and often you have to deal with unmanaged resources and not decide to delete them simply because they're not part of your desired state.)

Lastly, you have to handle bugs and design flaws in the APIs. I've worked with a number of platforms where two, valid calls to the API in a shell script are a race condition because the API doesn't support read-your-writes.

All this reinvents the wheel that is TF.

There's also "why does this infra exist?": I can comment TF, I get a commit history and rationales for why infra exists. Shell scripts really push people towards "I'll just #yolo this small change to the infra" … and now, I don't know why the infra is the way it is. Often, I find dev/prd have drifted, or two prod instances of the "same" thing are really different. Comments cut down on this, TF modules really cut down on it, etc.

> And what is it? It is a shell script inside of a JSON that was created in the shell! What for are these layers of abstraction? Why does he have to wrap the Resource Group name in a JSON? Why couldn't it be just piped in plaintext format, as all the tools that try to be POSIX-compatible do?

JSON is a text format. Your shell scripter has piped that into what amounts to a buggy, broken, 5% reimplementation of a JSON parser. Pipe that to `jq`, instead. (You can also use --query on az to reduce the output to something that will be more easily handled by `jq`, but anything --query can do, jq can too, pretty much, and it might be better to have all the code in one language.)

Or just request that data from terraform, by accessing the appropriate attribute of the that resource.


👤 jjice
Another part of Terraform's power is the fact it's all text. We can store it in version control and see changes over time, as well as have it go through CI/CD.

While there are platform specific alternatives like Cloud Formation, learning a new system for each platform would be a pain, and frankly, things like Cloud Formation just aren't as nice to work with compared to Terraform.