Congestion in Traditional CI Systems
Traditional CI systems simply monitor the quality level of project branches by periodically performing QA verifications on branch labels in order to detect regressions. In this post we'll look at why such approach can generate congestion, a significant speed limiting factor in the software development, especially for very large scale projects.
Let's consider for example a project utilizing a traditional CI system configured for a nightly execution with a QA verification consisting of building and testing an image. And let's assume that on average 4% of the incoming changesets cause build failures and 1% of them cause test regressions. Which is fairly conservative in the context of very large scale projects.
The CI system would detect these breakages by producing red labels, unusable for some or all development activities. The project is partially or totally blocked until a green label is produced. To keep things simple we'll also assume for now that several of these breakages don't occur simultaneously in the same label or in consecutive labels and fixing them would be an easy task, always done immediately upon their detection, as per the continuous integration methodology recommendations.
A feature needs to be developed on this project, requiring let's say 100 changes to be committed.
If a single developer would work on the feature at a rate of one changeset per day it'd take 100 days for the feature to be complete. The CI system would produce 5 red labels and 95 green labels. The project would only be blocked 5% of the time. Not too bad.
But the feature needs to delivered faster, a bigger team is assigned to it, let's say capable of producing 10 changesets per day. That means the 100 changesets would be committed in just 10 days. The CI system would only produce 10 labels, out of which 5 will still be red. The project would thus be blocked 50% of the time. Not very good.
By the same reasoning we can even estimate a maximum achievable commit rate - at 20 commits per day only 5 labels would be produced, all red. The probability of obtaining a green label becomes negligible. For all practical purposes the project would be completely blocked.
In reality things are often worse. One assumption we made was that the breakages are fixed immediately, so that the following CI label is green. This is definitely possible when the commit rate is one per day - the solution is to simply backout the last commit, clearly identified as the culprit. Some would advocate committing a fix instead, but that requires human analysis, additional work, and occasionally can fail to address the breakage or even cause a new one, further delaying recovery. I wouldn't recommend it.
As the commit rate increases fixing breakages becomes increasingly difficult, primarily because of 2 reasons:
- identifying the offending changeset is no longer trivial
- backing out the offending changeset no longer guarantees that the next label will be green
A traditional CI does a pretty good job at "containing" a changeset causing a regression at a transition from a green label to a red label - the changeset must be one of those committed between the two labels. Of course, we disregard the possibility of intermittent failures for the sake of simplicity (we'll make the impact of intermittent QA failures on CI performance the subject of another post).
When multiple changesets are present in the container outlined by the two labels the offending one must first be identified. This typically requires human analysis of the failure and cross-reference against the contents of the changesets in the container. Pin-pointing changesets causing build failures is usually simple, but not always. For test failures however identifying the changesets responsible can be quite difficult, potentially requiring expert knowledge. Both the time and the effort required for culprit identification increase with the number of changesets in the container, which grows as the project's commit rate increases.
Once identified, the offending changeset would typically be backed out. But the backout isn't guaranteed to produce a green label:
- in some cases plain reverse patching may not be possible due to other changesets being committed after the offending one and touching some of the same files as it does. In such cases the backout would actually produce a new changeset which isn't guaranteed to produce a green label as it wasn't ever verified as such.
- in other cases there may be changesets committed after the offending one and have functional dependencies on it. In such cases backing out the offending changeset would actually create new failures as it would break such dependencies - the next label would still be red.
Apart from the above-mentioned ones there are other possible cases in which the project's next label may not be green, for example:
- the assumption we made that several breakages don't occur in the same label or in consecutive labels may not be true. A build failure, for example, can mask other build failures and/or test failures caused by changesets in the same label. Even if the detectable build failure is fixed the next label would still be red because the previously masked failures would be uncovered. Or it could be red because of a failure caused by a new changeset present in that very next label. Can't really tell them apart.
- if the time required to identify and fix the breakage exceeds the interval between CI executions, the next label will still be red.
If, for whatever reason, subsequent labels remain red, identifying the culprit and fixing the breakages becomes increasingly difficult because the pool of suspects grows - the containment often must be extended to cover all the changesets committed since the last green label, it can't be restricted to only the changesets contained in just the most recent red label.
To make matters even worse, developers would typically keep using the progressively more outdated green label for their pre-commit QA verifications. The results of these verifications would be further and further away from the truth - how their changesets would actually behave when committed. This further increases the chance of new regressions piling up on top of those already being investigated. A snowballing effect which can bring the entire project to a screeching halt.
Some may argue that some traditional CI systems allow several QA executions per day, or even a QA execution for every changeset being committed, even if the previous QA execution is not complete. Correct, except it doesn't make much of a difference overall: as long as the culprit is not identified and the breakage/regression of the first red label is not resolved before the next QA execution the subsequent labels will also be red, be it because of the same changeset or not. And the above reasoning stands.
A non-blocking CI system like ApartCI completely eliminates project
blockages by preventing red labels altogether - the entire activity of identifying the changesets causing QA
regressions is performed on uncommitted/candidate changesets, prior to their commit in the project branch,
and rejecting them before they become a problem.
Projects utilizing such CI systems will never be blocked, progress will always be possible as good changeset candidates can always be verified and committed on top of the latest label. But more on this in another post.