The Rise of Infrastructure as Data

Simplifying IT automation

IT infrastructure should be simpler to automate. A new method of describing IT configurations and policy as data formats can help us get there. To understand this conclusion, it helps to understand how the existing tool chains of automation software came to be.

In the beginnings of IT infrastructure, administrators seeking to avoid redundant typing wrote scripts to help them manage their growing computer hordes. The development of these in­house automation systems were not without cost; each organization built its own redundant tools. As scripting gurus left an organization, these scripts were often very difficult to maintain by new employees.

As we all know by the huge number of books written on the topic, software development sometimes has a large amount of time investment required to do it right. Systems management software is especially complex, due to all the possible variables and corner cases to be managed. These in­house scripting systems often grew to be fragile.

Noticing this problem, large companies stepped up to fill the gap, backed by legions of paid developers, product managers, and sales folks, but somehow, each product—deeply steeped in initially healthy capitalism—grew large and bloated. Buoyed by the need to have strong sales and a slick trade show demo, these tools tended toward graphical impressiveness, but seldom could cope with the wide variety of variation and rate of change in today’s IT architectures.

In frustration with the expensive large enterprise solutions, and inspired by the community success of Linux, many groups of skilled systems administrators rose to the challenge, writing their own applications in an attempt to unseat the large corporate software with solutions that worked better for them. Communities rose up and rallied around their favorite tools.

These efforts, including a project of my own, Cobbler, eventually became associated with the growing “DevOps” movement—the idea that walls between developers and operations should be relaxed. While many will argue “DevOps is a culture of communication,” one of the main early benefits was that it inspired systems administrators to learn to write code and coders to learn infrastructure. We all got smarter as a result.

Many frameworks grown to manage this infrastructure allowed writing “code” to describe infrastructure configurations, whether directly in popular programming languages or custom domain-specific programming languages. This idea became known as “Infrastructure As Code.” IAC was more reliable than in­house scripting, but a lot of it got supremely complicated, just like before.

Both the enterprise and IAC approaches have major gaps. The enterprise solutions are very good at a few limited use cases, particularly in providing click­-driven solutions for less technical users, but were not flexible enough for complex automation challenges. Worse still, they are typically priced out of range of the smaller startup.

On the other hand, “Infrastructure As Code” solutions present flexible systems, but often require larger automation teams to build content, often require custom glue scripting, and have a very steep learning curve. The result of IAC is that projects take longer to deliver and organizations become locked into their automation tooling. Whole teams of people start to rise up around generating and testing content for the tool. So while IAC was cheaper (often free) in terms of overall buyer cost, the labor cost makes it deceptively expensive. The opportunity cost from losing use of strategic employees to constant IAC development overhead is rarely even measured.

A middle ground comes in realizing that Infrastructure is best modeled not as code, nor in a GUI, but as a text-based, middle-ground, ­­data­-driven policy. A list of software packages to install is just that—a list of packages to install. Infrastructure processes and configurations can be described in terms of what they look like, and can still be edited in basic text representations without reaching the levels of complexity found in software development.

I call this “Infrastructure As Data”—describing what your systems look like in simple machine­-readable data formats. Have programs execute those data formats and ensure your infrastructure matches. The result is that configurations can be flexible, and also easy to prototype, easy to audit, and easy to maintain.

IT directors obviously should think carefully about spending budget on expensive enterprise management software, ­­but similarly should be risk­-averse to devoting large numbers of employees to forever maintaining an “Infrastructure as Code” project.

“Infrastructure as Data” can help find a reasonable middle ground.

Related

Sign up for the O'Reilly Programming Newsletter to get weekly insight from industry insiders.
topic: Web Perf/Ops
  • mpdehaan

    Author here :) … For those interested in tooling based on these ideas, you may be interested in exploring http://ansibleworks.com/. Discussion welcome!

  • Josh

    Do you mean things like KVM config files? Or kickstart/preseeds? Or cfengine configurations? All of these things are mature, standard data entities that define infrastructure very nicely.

    • mpdehaan

      None of these are generically machine parseable though, and many require programming.

      In particular, Kickstarts/preseeds basically become bash scripts, and preseeds are far more limited.