All posts
engineering5 min read

One Button to Deploy: How We Reduced Our AI Release Process to a Single Click

Our deploy process used to take 119 steps across 4 systems. Now it takes one button press. Here is the engineering story behind reducing deployment complexity by two orders of magnitude.

One Button to Deploy: How We Reduced Our AI Release Process to a Single Click

One hundred and nineteen steps.

That is how many manual actions our deploy process required eighteen months ago. I know the exact number because an intern counted them during their first week and asked, with the kind of innocent directness that only an intern can muster, "why?"

It was a fair question. We did not have a good answer.

The 119-Step Deploy

Here is what deploying a new AI model configuration looked like:

  1. Update the model config in the codebase (1 step)
  2. Run local tests (3 steps: unit, integration, type check)
  3. Create a pull request (1 step)
  4. Wait for CI (1 step of waiting, but 12 CI jobs to monitor)
  5. Get code review approval (1 step)
  6. Merge to staging (1 step)
  7. Update the staging environment variables (4 steps across 3 services)
  8. Run the staging smoke tests (8 steps)
  9. Monitor staging for 30 minutes (1 step, but who counts)
  10. Create a production deploy ticket (6 steps in the ticketing system)
  11. Get production deploy approval (3 steps across 2 approval chains)
  12. Run the production deploy (5 steps)
  13. Update production environment variables (4 steps across 3 services)
  14. Run production smoke tests (8 steps)
  15. Monitor production for 60 minutes (1 step)
  16. Update the status page (2 steps)
  17. Notify stakeholders (3 steps)
  18. Close the deploy ticket (2 steps)

Plus about 50 other micro-steps I am not listing because you would stop reading.

Each of these steps existed for a reason. Each was added after an incident where the absence of that step caused a problem. The process was a geological record of every mistake we had ever made.

The problem: a process that captures every lesson eventually becomes too heavy to execute. And when a process is too heavy to execute, people skip steps. And when people skip steps, incidents happen. And when incidents happen, more steps get added.

This is the deploy paradox: the process designed to prevent failures becomes the primary cause of failures.

The Insight

The insight that changed everything came from an unlikely source: our credit system.

We had built an automated credit consumption pipeline that handles thousands of transactions per minute. Each transaction validates the organization's balance, checks rate limits, calculates cost based on the model and token count, deducts credits, and logs an audit trail. It does all of this in under 50 milliseconds with zero human intervention.

If we could automate financial transactions with that level of reliability, why were we manually deploying code?

The One-Button Architecture

We spent three months rebuilding our deploy pipeline around a single principle: every step that can be automated must be automated, and every step that cannot be automated must be eliminated.

Here is what we built:

Automated Quality Gates

Instead of running tests manually, every push triggers a comprehensive quality pipeline:

  • Type checking catches contract violations
  • Unit tests catch logic errors
  • Integration tests catch system interaction failures
  • AI-specific tests validate model responses against golden datasets
  • Cost estimation tests ensure no configuration change will unexpectedly blow up the token budget

If any gate fails, the deploy is blocked automatically. No human judgment required.

Environment Promotion

Instead of manually updating environment variables across services, we built an environment promotion system. A configuration change starts in development, gets promoted to staging when tests pass, and gets promoted to production when staging validation succeeds.

Each promotion is a single database operation. The services read their configuration from a central store and hot-reload when it changes. No restarts. No manual updates.

Canary Analysis

Instead of monitoring staging and production manually, we built automated canary analysis. When a new configuration is deployed, it serves 5% of traffic. An automated system compares error rates, latency percentiles, and credit consumption between the canary and the baseline. If the canary is worse on any metric, it rolls back automatically.

This eliminated our longest step: the 60-minute production monitoring window. The canary system makes a rollback decision in under 5 minutes with more statistical rigor than a human staring at a dashboard ever could.

The Button

After all of this automation, the deploy process looks like this:

  1. Engineer pushes code
  2. Quality gates run automatically
  3. Code is promoted through environments automatically
  4. Canary analysis validates automatically
  5. Rollback happens automatically if needed

The "one button" is the merge button on the pull request. Everything else is automated.

What We Learned

Automation is not about speed

Our 119-step process took about 4 hours. Our automated pipeline takes about 20 minutes. But the speed improvement is not the main benefit.

The main benefit is consistency. A human executing 119 steps will occasionally skip step 47 because they are tired, or reorder steps 68 and 69 because they think it does not matter, or forget step 112 because they got distracted by Slack.

An automated pipeline executes every step, in order, every time. It does not get tired. It does not get distracted. It does not think it knows better.

Trust requires investment

The reason we had 119 manual steps was that we did not trust automation. And the reason we did not trust automation was that we had never invested in making it trustworthy.

Building the one-button pipeline required investing in:

  • Comprehensive test coverage (not just happy-path tests)
  • Reliable CI infrastructure (not flaky tests that pass on retry)
  • Observable canary analysis (not opaque green/red signals)
  • Fast, safe rollback (not "hope it works" rollback)

Each of these investments paid for itself within weeks.

Start with the most painful step

We did not automate all 119 steps at once. We started with the most painful one: the 60-minute production monitoring window. Automating that single step saved 1 hour per deploy and eliminated the most error-prone part of the process.

Then we automated the next most painful step. And the next. Within three months, there were no manual steps left.

The Counter

We keep a counter in our deploy dashboard. Not of deploys—of steps eliminated. It reads 119. Every new engineer who joins asks about it, and we tell them the story.

Then we show them the button.

Related articles

Continue reading with similar insights and playbooks.

Structured Outputs in Production: Stop Parsing Chaos
ai

Structured Outputs in Production: Stop Parsing Chaos

Free-form AI output breaks downstream workflows in subtle ways. This guide explains schema-first generation, validation gates, and recovery patterns that keep production systems reliable.

ai

Building AI-Native Applications

How to leverage OpenAI and credit systems to build the next generation of intelligent software.

ai

What Socrates Would Ask Your AI: The Lost Art of Interrogative Prompting

Twenty-four centuries ago, Socrates proved that the quality of an answer depends entirely on the quality of the question. Modern AI makes this ancient insight urgently practical again.