Infrastructure As Pseudocode
There are too many special snowflake systems in DevOps. One way we see this is in the proliferation of DevOps data stores: databases and search engines intended specifically for metrics and logs. Another is the way in which “infrastructure as code” always ends up as stuff that’s not code.
Much business and software functionality is exposed through libraries. This includes payments, presentation, graphics, data analysis and even cloud computing. When a vendor wants to deliver functionality, they put together an API and a library for that API, offering it in all the programming languages — PHP, Java, Ruby, Python, Swift — from which they imagine anyone would want to use it. The idea being, you are going to use this from your code; so of course you need a library that you can import in your code.
With infrastructure software, something else happens. When we get new tooling — Chef, Puppet, Terraform — we also get a new way to program. This might be something close to an existing language — as with Chef — but usually it’s a distinct language embedded in YAML with its own approach to control flow, functions, modules and string escaping. There is no config management system delivered with the intent of being an importable library — require “chef” — to be used from your code like any other library.
This is done in the name of being “declarative”; but that is a specious promise. Deployments and other operations tasks are part of workflows. Having a declarative API doesn’t mean we need a declarative language. A common workflow — one I have implemented many a time — is:
- User wants to deploy a specific version of the code.
- Check the build server to see if this version passes tests.
- If not, inform the user.
- If so, deploy to the beta environment.
- Check the error rate on the beta environment, to ensure they remain below threshold. (API call to cloud provider’s load balancer.)
- If not, rollback the beta environment. If so, go to the developer channel in IRC and inform developers that a deploy is going to production. (This involves going to a database to find out which channel that is.)
- Deploy to the production environment tied to that beta environment. (This again involves going to a database to find out the relevant environment.)
- Check production for elevated error rates. (API call to cloud provider’s load balancer.)
- If things look bad, rollback.
- Inform developers in IRC, either way.
Which parts of this are declarative? The specification of the deployments — where to deploy, which SHA — undeniably are; but the rest of it is clearly imperative in nature. (This is not to say, that we can not model imperative programs in declarative languages; but since that is what we are doing, we should not pretend we have a declarative task to begin with.)
In practice, teams implement these workflows by synthesizing configurations for the infrastructure automation system — templating them out — and driving the workflow with a script. Declarative configuration management leads to an awful lot of shell script.
This undermines the promise of infrastructure as code, which one would hope means: code that one’s entire team can work on and understand. Once you’ve got a mini-language driven by shell scripts, you’ve got specialization, and not in the good sense of domain knowledge, but in the bad sense of opacity.