The Case for Monitoring as Code


You build it; you own it! It’s a simple mantra that has driven software development for years. The days of writing software and throwing it over the wall to operations teams are over. Instead, software development teams take ownership of what they do and own their own software operations.

There is just one problem: Monitoring tools have not yet adopted the developer workflow. As a developer, the repository is the center of the workflow. It's the one single source of truth. Every change requires a commit to the repository before a new version gets built and eventually deployed to production. Nothing will get deployed without being committed to a repository earlier. That workflow is crucial for us, and it comes with massive benefits.

First and foremost, having version control enables teamwork. Certainly, all of our tools should follow that paradigm and need to adopt it no matter what phase in our pipeline they are made for. Consequently, now that engineering teams own monitoring, tools for monitoring also need to follow.

Let's take a step back and look at how development teams are provisioning their cloud infrastructure. They do that with infrastructure as code (IaC) tools, such as Terraform, CDK, Puppet, or Pulumi. These tools allow you to write code describing the desired state of your infrastructure, and eventually, these tools will set the infrastructure up for you. The IaC movement was born out of the frustration that manual provisioning was a slow, error-prone process. It also caused low transparency and hindered collaboration across teams. Having your infrastructure version controlled close to your application code changed the game!

Additionally, for end-to-end testing, this workflow has been the standard for almost two decades. Since Selenium's initial release in 2004, we have stored our tests in a repository to run them in our pipeline before shipping the code to production. No developer would today suggest storing tests outside of a repository.

"Checkly integrated with Terraform enables us to quickly create, modify, and deploy API and browser checks for a broad and diverse audience of internal customers. The codified workflow ensures full transparency, thanks to built-in auditing and documentation!" - Andreas Lehr, Team Lead at Schwarz Group IT

Setting up monitoring comes with similar requirements as it is supposed to be developer-owned today. However, tools aren't supporting the repository as the source of truth for our monitoring. This needs to change. We need to be able to write code that describes the desired state of our application and have the monitoring tools take care of setting the monitoring up for us. This is where Monitoring as Code (MaC) comes in. I believe MaC is the future workflow for monitoring, but what is it?

What is Monitoring as Code?

Monitoring as code allows us, similarly to IaC, to describe the desired monitoring infrastructure as code. Hundreds of our customers at Checkly use our Terraform provider, Pulumi provider, a K8s operator, or custom-made approaches via our API or SDK to create, manage and run their checks. To provision new checks, they define them in HCL, Typescript, or JavaScript and commit them together with the application code. And that is all! Monitoring is now part of your code base and this comes with solid advantages.

API check definition with Terraform

Related Reading:

Advantages of Monitoring as Code

  1. Stay in sync
    Synthetic monitoring is meant to monitor the current state of your application and alert you if things go wrong. However, changes to your service often require changes to your monitors. Imagine a new version that changes a current service goes live. Our monitoring would break immediately, or the monitors need to be muted before the rollout and then adjusted and unmuted later. It all happens after the fact. In contrast, having your monitors configured close to the application code enables you to make the required updates when changing the application code and stay in sync.
  2. Efficiency through automation
    Monitoring tools are traditionally configured manually via their UI and often by Operations teams. This process is slow. Configuring synthetic monitoring as code enables you to spin up new monitors—in Checkly terms, checks—automatically and fast, for example, when you are working on a new API endpoint. This means that every time a new endpoint is introduced, the corresponding monitor will also be created.
  3. Increased transparency and collaboration
    With Monitoring as Code, all of your checks live alongside your application code in the same repository. This makes it much easier to manage and understand what is being monitored, by which check and why. It enables engineering teams to collaborate and individual team members to spin up new monitors when needed.
  4. Version control for your monitoring
    MaC also enables you to take advantage of version control for your monitoring. This means that you can track changes, revert to previous versions and even see who made what change and when. In short, Monitoring as Code is a necessity for modern development teams. It enables you to take full advantage of automation, transparency and collaboration.

The Future of Monitoring as Code

There is more. My vision is to make Monitoring as Code delightful, frictionless and easily accessible for every modern developer with an IDE and a git repository. The future MaC will be based on open source frameworks like Playwright, Jest and others. Standards that are already available have been used for testing and automation for years.

Currently, as described above, achieving MaC with Checkly requires extra tools like Terraform or Pulumi, making it too hard to adopt this workflow, especially for teams without prior IaC knowledge. We are going to change that and are working on it today.

MaC will require:

  1. a simple npm install @checkly/cli,
  2. JavaScript or TypeScript to create, manage and run checks and
  3. a repository to store your monitoring as code.

We plan to launch a public beta of the new MaC workflow in the first quarter of next year and will let you know via our blog, socials and our changelog when you can check it out. If you'd like to connect, please follow me on Twitter (@HLENKE).


Recent posts

Don’t miss any updates

No fluff, all relevant product updates & long form content. 🍍