A Story of Release Gates: An Antipattern

Published: 2021-Jul-24

Antipattern: A pattern that may be commonly used but is ineffective and/or counterproductive in practice.

A few years ago, I was hired by the CTO of a local product company to help them address strife between the developers and product managers. It’s a common story: product managers want new features faster; developers want time to improve the architecture of the systems, test their code, etc.

Never mind all that, however, because the real problem in the organization was the fact their Sales VP had successfully influenced the CEO to enforce release gates. (The strife between developers and product managers was a symptom.)

You see, the developers had a habit of releasing code every week or so. Most code was released successfully, but mistakes would sometimes occur. Some errors would be caught and corrected immediately; others would escape. The escaped defects were often highly visible and resulted in customer complaints. With downward pressure on new sales, the solution lobbied (successfully) by the VP of Sales was to throttle the deployments.

Like a knee-jerk reaction, the company decided to release code only after business hours. Developers were expected to work through the night and manually test everything. When that didn’t solve the problem (because the root cause wasn’t addressed) the company enforced stricter release gates: they allowed developers to release code only once per month (after business hours). That decision, naturally, increased the batch size (and therefore the risk) of each deployment. The company went even further and gave the VP of Sales veto power — which he could wield on a whim to stop any deployment “in case an important customer might be impacted”.

(It makes sense from a certain perspective, right? Release gating was a coping mechanism for escaped defects. But such a pattern causes authority to swing to the loudest stakeholders who, without fully understanding the technical options, impose seemingly intuitive but counter-productive remedies.)

With each additional check and balance, with every new rule, the bureaucracy around this newfound veto authority became immense:

Scheduling a deployment grew more complicated.
“Release Managers” were anointed and told they were “Accountable” for quality.
Permission was required before the release of any new code. Vice Presidents who were detached (reasonably so) from the detail and consequence would ceremoniously grant their approval amidst a theatre of risk management.
Email notifications flew around the office to ensure all asses were covered.
If regressions were caused, unholy inquisitions were held and “lessons learned” were strictly documented.

I observed a couple of outcomes (predictable, I think). First, I observed the technical staff work very hard to abide the new bureaucracy. To their credit, I’d even say their code quality improved (slightly) and their manual testing techniques evolved. But nobody really noticed. Because second, every deployment was then a large batch with more inherent risk — so, the chance a defect would escape was nearing 100%.

And, while 99% of a deployment would go smoothly, problems were highly visible and treated as a crisis. If the response to every incident is ‘crisis management’ then every incident becomes a crisis.

I saw the situation differently.

With so much pressure to deliver new features, the developers were never allowed (or they never demanded) the latitude to learn and implement automated testing, deployment, and rollback. So, every release of new code produced unintended regressions.

From my vantage point it appeared obvious the solution was more frequent deployment, not less. Smaller batches, not larger. The confidence the executives wanted would not be found in bureaucracy, but in routines. It must become the habit of the development teams to frequently deploy small-batches of automatically tested code. Deployments should be frequent and utterly boring. (Reference: If it hurts, do it more often…frequency reduces difficulty.)

The pursuit of this goal (CI/CD) has tremendous fringe benefits:

Teams take ownership of their code from localhost to production.
A CI/CD pipeline using any modern toolset includes automated rollback and event logging. (Human error, still prone to occur, is less likely to hit production environments.)
Automated testing can be implemented incrementally: perhaps early deployments include automated unit tests; then UI tests are implemented and supported in the pipeline; eventually the pipeline may be improved to support load testing, code linting, security checks, and so on. (All managed by the platform while the developers’ interface changes very little.)
Deployments take minutes, not hours.
Recovering from defects is easier and less costly.
Accountability becomes clearer: app developers are responsible for their app(s); platform engineers are responsible for the platform; VP of Sales is not responsible for deployment schedules!

And so on. The benefits of CI/CD are well known and documented. But among the most important — and relevant to this article — is how it can change the relationship between developers and product managers. Remember the common strife I mentioned earlier:

“…product managers want new features faster; developers want time to improve the architecture of the systems, test their code, etc.”

Techniques like CI/CD (a practice of DevOps, notice) help alleviate the developers’ burden regarding system architecture and manual testing. When that burden is lifted, more mental energy and time is available for product development.