Narrative

Kubernetes CI/CD Deployment Engine

2×/month20×/monthdeploy frequency

The team was doing 2 deployments per month because every deploy was a manual, anxiety-inducing event. Rollbacks required redeploying old images and took 30+ minutes. Main branch broke 8 times a month.

DevOpsKubernetesCI/CD

What Was Broken

  • 2 deployments per month — fear-driven infrequency
  • 30+ minute rollbacks requiring manual kubectl commands
  • Main branch breaking 8+ times per month from unreviewed pushes
  • No audit trail — impossible to know what was deployed when
// required fix
  • Branch protection and PR-gated CI checks on every commit
  • Optimized Docker builds with layer caching
  • Parallel test suite under 5 minutes total
  • ArgoCD-based GitOps with automated rollback on health check failure
  • Zero-downtime Blue/Green deployments on EKS

How It Was Built

Built end-to-end: branch protection → optimized Docker builds → parallel tests → ArgoCD GitOps → Argo Rollouts Blue/Green with Prometheus-gated promotion.

Branch Protection → Build → Test → Deploy
  • Full pipeline: protected main, multi-stage Docker build (920MB → 118MB), parallel test matrix (22min → 4.
  • 📄 .github/workflows/ci-cd.yml

Branch Protection → Build → Test → Deploy

Full pipeline: protected main, multi-stage Docker build (920MB → 118MB), parallel test matrix (22min → 4.5min), ArgoCD auto-sync, Blue/Green rollout with Prometheus analysis gates.

.github/workflows/ci-cd.yml
yaml
on:
  push:
    branches: [main]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: docker/build-push-action@v5
        with:
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=registry,ref=ghcr.io/${{ github.repository }}:cache
  update-manifest:
    needs: build
    run: |
      yq e '.spec.template.spec.containers[0].image = "$IMAGE"' \
        -i k8s-config/apps/production/deployment.yaml
      git push  # ArgoCD picks this up automatically

What Changed

Deploy frequency: 2×/month → 20×/month. Rollback time: 30 minutes → 30 seconds. Zero 503s during any deployment since implementation.

Deploy frequency
2×/month
0
10× increase
Rollback time
30 min (manual)
0
60× faster
Build time
14 min
0
8× faster
Test wall time
22 min
0
5× faster
"Deployment became a non-event. The team ships 10× more frequently without scheduling maintenance windows or fearing rollbacks."

Common Questions

GitOps pull-based deployments are far more secure and reliable. ArgoCD runs inside the cluster and pulls changes, meaning we don't have to give our CI server admin credentials to the Kubernetes cluster.
We used Argo Rollouts integrated with Prometheus metrics. If error rates spiked during the 'Green' phase, the rollout would automatically abort and revert traffic to the stable 'Blue' version.
Cultural resistance. Developers were used to long-lived feature branches and manual testing. We had to prove that shipping smaller, more frequent updates with automated safety nets was actually much safer.