One hundred and nineteen steps.
That is how many manual actions our deploy process required eighteen months ago. I know the exact number because an intern counted them during their first week and asked, with the kind of innocent directness that only an intern can muster, "why?"
It was a fair question. We did not have a good answer.
The 119-Step Deploy
Here is what deploying a new AI model configuration looked like:
- Update the model config in the codebase (1 step)
- Run local tests (3 steps: unit, integration, type check)
- Create a pull request (1 step)
- Wait for CI (1 step of waiting, but 12 CI jobs to monitor)
- Get code review approval (1 step)
- Merge to staging (1 step)
- Update the staging environment variables (4 steps across 3 services)
- Run the staging smoke tests (8 steps)
- Monitor staging for 30 minutes (1 step, but who counts)
- Create a production deploy ticket (6 steps in the ticketing system)
- Get production deploy approval (3 steps across 2 approval chains)
- Run the production deploy (5 steps)
- Update production environment variables (4 steps across 3 services)
- Run production smoke tests (8 steps)
- Monitor production for 60 minutes (1 step)
- Update the status page (2 steps)
- Notify stakeholders (3 steps)
- Close the deploy ticket (2 steps)
Plus about 50 other micro-steps I am not listing because you would stop reading.
Each of these steps existed for a reason. Each was added after an incident where the absence of that step caused a problem. The process was a geological record of every mistake we had ever made.
The problem: a process that captures every lesson eventually becomes too heavy to execute. And when a process is too heavy to execute, people skip steps. And when people skip steps, incidents happen. And when incidents happen, more steps get added.
This is the deploy paradox: the process designed to prevent failures becomes the primary cause of failures.
The Insight
The insight that changed everything came from an unlikely source: our credit system.
We had built an automated credit consumption pipeline that handles thousands of transactions per minute. Each transaction validates the organization's balance, checks rate limits, calculates cost based on the model and token count, deducts credits, and logs an audit trail. It does all of this in under 50 milliseconds with zero human intervention.
If we could automate financial transactions with that level of reliability, why were we manually deploying code?
The One-Button Architecture
We spent three months rebuilding our deploy pipeline around a single principle: every step that can be automated must be automated, and every step that cannot be automated must be eliminated.
Here is what we built:
Automated Quality Gates
Instead of running tests manually, every push triggers a comprehensive quality pipeline:
- Type checking catches contract violations
- Unit tests catch logic errors
- Integration tests catch system interaction failures
- AI-specific tests validate model responses against golden datasets
- Cost estimation tests ensure no configuration change will unexpectedly blow up the token budget
If any gate fails, the deploy is blocked automatically. No human judgment required.
Environment Promotion
Instead of manually updating environment variables across services, we built an environment promotion system. A configuration change starts in development, gets promoted to staging when tests pass, and gets promoted to production when staging validation succeeds.
Each promotion is a single database operation. The services read their configuration from a central store and hot-reload when it changes. No restarts. No manual updates.
Canary Analysis
Instead of monitoring staging and production manually, we built automated canary analysis. When a new configuration is deployed, it serves 5% of traffic. An automated system compares error rates, latency percentiles, and credit consumption between the canary and the baseline. If the canary is worse on any metric, it rolls back automatically.
This eliminated our longest step: the 60-minute production monitoring window. The canary system makes a rollback decision in under 5 minutes with more statistical rigor than a human staring at a dashboard ever could.
The Button
After all of this automation, the deploy process looks like this:
- Engineer pushes code
- Quality gates run automatically
- Code is promoted through environments automatically
- Canary analysis validates automatically
- Rollback happens automatically if needed
The "one button" is the merge button on the pull request. Everything else is automated.
What We Learned
Automation is not about speed
Our 119-step process took about 4 hours. Our automated pipeline takes about 20 minutes. But the speed improvement is not the main benefit.
The main benefit is consistency. A human executing 119 steps will occasionally skip step 47 because they are tired, or reorder steps 68 and 69 because they think it does not matter, or forget step 112 because they got distracted by Slack.
An automated pipeline executes every step, in order, every time. It does not get tired. It does not get distracted. It does not think it knows better.
Trust requires investment
The reason we had 119 manual steps was that we did not trust automation. And the reason we did not trust automation was that we had never invested in making it trustworthy.
Building the one-button pipeline required investing in:
- Comprehensive test coverage (not just happy-path tests)
- Reliable CI infrastructure (not flaky tests that pass on retry)
- Observable canary analysis (not opaque green/red signals)
- Fast, safe rollback (not "hope it works" rollback)
Each of these investments paid for itself within weeks.
Start with the most painful step
We did not automate all 119 steps at once. We started with the most painful one: the 60-minute production monitoring window. Automating that single step saved 1 hour per deploy and eliminated the most error-prone part of the process.
Then we automated the next most painful step. And the next. Within three months, there were no manual steps left.
The Counter
We keep a counter in our deploy dashboard. Not of deploys—of steps eliminated. It reads 119. Every new engineer who joins asks about it, and we tell them the story.
Then we show them the button.
