CI/CD has become a standard tool in every modern software developers day to day experience. We demand a lot from our CI/CD systems! and our software is all the better for it. But I think a lot of us have a slightly under-principled way of talking and thinking about it. Specifically I want to talk about how we tend to lump two very different types of tasks under the banner of "tests" or sometimes we don't even talk about one of these types of tasks but we still have them sitting in our pipelines and workflows and what have you.

Quick disclaimer this post is somewhat just me ranting about how a conversation with the darling Isabel Roses1 about the purpose of the checks output of nix flakes has given me a new thing to be pedantic and annoyed about and perhaps reading this will give you that same cognito hazard. That may or may not be my hope, you have been warned :3

Categorising CI/CD tasks

If you know me (or have read some of the posts in this blog before) you are likely not surprised that I love me some terminology! So let's start there, this whole thing is ~33% terminology anyways. The CI/CD world has a myriad of terminology for a bunch of confusingly overlapping and conflicting ideas and concepts. So I'll try and codify some neutral ones here.

Generally we have CI/CD "tasks"2. These are all the individual actions we ask our CI/CD systems to perform. Tasks can depend on each other and through their dependencies and additional structure we codify "workflows"3 of which tasks to run, when to run them and in what order. Generally speaking we distinguish 3 types of tasks:

  • Builds: build the actual software itself

  • Tests: test that the software does what it should if it built

  • Deploys: deploy the built software if all the tests pass

But I want to argue in this post that there is a 4th type and that we all too often lump it in with tests in our heads but that we shouldn't because they have very different semantics! These are what I'm going to call

  • Checks: make sure that the code is "up to standard", formatted, linted, has licence header, etc.

The main pipeline4

I said that checks have very different semantics from tests because the don't participate in the main CI/CD pipeline of Build -> Test -> Deploy. These three have a direct dependency on each other in that order. If we can't build then we can't test, if our tests fail then we don't want to deploy. Checks however fall outside this core flow from code to deploy. Checks can be run without building the code, and often we want to do that so we can work on fixing whatever the checks found while the other CI/CD tasks run5. Checks also don't generally stop or block other CI/CD tasks. We don't want to not deploy working code just because it's indented slightly wrong or because we call .or_else(() -> { null }) instead of or_else_null() in one place. Those are things we want to check6 centrally in CI/CD and fix, but it's not something we want to block the world on. The dependency relation is entirely different. They depend on very little and very little depends on them. But we all too often treat them as just another kind of test.

For a concrete example: As mentioned earlier this post is born out of a discussion about the nix flake checks output and the accompanying nix flake check command. nix flake check does essentially two things: It makes sure that the flake is correctly structured and it runs all the additional tasks specified by the checks output. This looks very much like ... well checks! That first part is more or less linting the flake.nix and the second part is where you'd define your formatting tasks and what not. but many many a person and project in the nix ecosystem also use the checks output to specify full, proper tests! The nix flake schema doesn't specify any place to put tests so they end up in checks. Even the nix reference manual says that nix flake check "check[s] whether the flake evaluates and run its tests"7. Not to mention the whole can of worms that is the formatter output and nix fmt. The conversation I had with Isabel amounted to her mentioning that she doesn't like when people do this and that she doesn't think that that is what the checks output is/should be for and as time goes on I'm agreeing with her more and more.

The overall conversation we were having was about whether or not checks should be run in a nix CI/CD (and specifically tangleds upcoming nix CI/CD) before or after the package builds defined by the flake. If checks includes tests well then it should be run after! But if checks is purely checks then it should be run first and in parallel with the builds! Just throwing everything in one big dependency graph and running tasks as soon as their dependencies are met works to solve the ordering issue8 (and is what a lot of nix CI/CD systems do, from the older-than-flakes-grandpa hydra to the flakes-first-youngin garnix), but we categorise our tasks for a reason! It's useful to be able to run just the checks without running the builds and the tests. It's useful to categorise them and separate them in our CI/CD UIs. Getting rid of that or reducing that to just part of the tasks name isn't as attractive an option to me as just properly separating checks and tests in the first place! and I wish that flakes had a separate tests output and a nix test or nix flake test command to go along with it ...

Well that's me done yelling at clouds ... for now at least.