|This series deals with the implementation of a unit testing process in a team or across multiple teams in an organization. Posts in the series include:|
|Goals||Outcomes||Leading Indicators I|
|Leading Indicators II|
Part I was HUGE! Now, let’s look at broken builds. We want to see a decrease in their number over time.
This may sound a bit strange. Our CI is supposed to tell us if the build breaks, that’s its job. Isn’t having more of them a good thing?
Unsurprisingly, the answer is “it depends”.
As we want the earliest feedback, the answer is “yes, of course”, the CI system serves us well. However, if we don’t see a decrease in broken builds, that may mean the CI process is not working effectively. We should investigate.
Let’s trace the steps leading to failing builds and see if we can improve our process.
Are all the tests passing locally? If not, we’re integrating code that fails tests into the trunk. If tests are not run locally, when they run in CI builds, they will probably fail too. That’s a big no-no. We may even find out the tests are not even run locally, and we’d want to improve on these behaviors.
If tests do run and pass locally before they are committed, there might be another problem. That may point to issues of isolation. If they pass locally, tests that depends on available resources in the local environments, find them there. But at the CI stage, they don’t and fail. More broken builds, indicate the team has not learned how to write isolated tests yet.
There might even be a bigger issue lurking.
We want to trust them on the CI environment, but since they “work on our machine” and not on the CI, these tests just got a trust downgrade. This can have a weird counter effect on our way of running them.
Since results we trust run on the CI, and local runs are creating confusing results, we may stop running tests locally at all, and run them instead on the CI server making sure they run correctly there. When we do that, we make our feedback cycle longer, but more importantly, we risk the tests failing for the right reason, but holding the rest of the team hostage until they are fixed.
To get the right feedback early, we need to get back to running tests locally.
We want to increase the number of isolated tests, so they can be run locally, and can be trusted to fail on the CI server. Isolated unit or integration tests failing before committing is the first line of defense.
Then, we want to be able to run the non-isolated tests either locally or in a clean environment as we can manage. The point is to not commit code until we trust it. This may require changing available environments, modifying the tests to ensure cleanliness, pre-commit integration or any combination of those.
Can you believe all these improvement opportunities come from a single indicator? The deeper we dig, and more questions we ask, we can find opportunities for improving the process as a whole.
We’re not done yet.