Downtime during deployment is a solved problem, yet many organisations still schedule maintenance windows to release new versions of their applications. The techniques for zero-downtime deployment are well-established and, when implemented correctly, make deployments less risky rather than more. Our standard CI/CD blueprint achieves continuous deployment with zero-downtime for every client project.
The foundation of our approach is blue-green deployment. Two identical environments, blue and green, sit behind a load balancer. At any given time, one environment serves production traffic while the other is idle. A deployment targets the idle environment, runs health checks, and then switches the load balancer to direct traffic to the newly deployed version. If anything goes wrong, switching back to the previous environment takes seconds.
Zero-downtime deployment is not a feature you add at the end. It is an architectural constraint you design for from the start.
On AWS, we implement this using Application Load Balancers with target group switching. The deployment pipeline builds a new container image, pushes it to ECR, updates the ECS service for the idle target group, waits for the new tasks to pass health checks, and then modifies the ALB listener rule to route traffic to the new target group. The entire process is automated through GitHub Actions and takes approximately four minutes from commit to production.
Database Migrations Without Downtime
The most challenging aspect of zero-downtime deployment is database migrations. Schema changes that are incompatible with the current application version will cause errors during the switchover period when both versions are briefly running simultaneously. We solve this by splitting breaking schema changes into backward-compatible steps. Adding a column is safe. Renaming a column is done in three deployments: add the new column, deploy code that writes to both columns, then remove the old column.
- Implement blue-green or canary deployment as the default strategy
- Split breaking database migrations into backward-compatible steps
- Automate rollback based on error rate thresholds post-deployment
- Run health checks against the new version before switching traffic
- Keep the previous version available for at least 24 hours after deployment
- Test the deployment pipeline itself, not just the application code
Our pipeline includes automated rollback triggered by error rate monitoring. If the error rate exceeds a threshold within the first five minutes after deployment, the system automatically switches back to the previous version and alerts the team. This has been triggered twice in the past year, both times catching issues that would have required manual intervention under a traditional deployment model.
