Zero-downtime deployments aren't a feature you add at the end. They're an architectural constraint you design for from day one.
After deploying dozens of services to production Kubernetes clusters, I've developed a playbook that reliably achieves zero-downtime releases — even for stateful workloads.
The fundamental requirement: graceful shutdown
Every container in your deployment must handle SIGTERM correctly. This is step zero, and most guides skip it.
// main.go — proper graceful shutdown
func main() {
server := &http.Server{Addr: ":8080", Handler: router}
// Start server in goroutine
go func() {
if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatal(err)
}
}()
// Wait for interrupt signal
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
// Give in-flight requests time to complete (must match terminationGracePeriodSeconds)
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := server.Shutdown(ctx); err != nil {
log.Fatal("Server forced shutdown:", err)
}
}
Rolling update configuration
For most workloads, a properly tuned rolling update is all you need:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # Never kill old pods until new ones are ready
maxSurge: 1 # Allow one extra pod during transition
template:
spec:
terminationGracePeriodSeconds: 30
containers:
- name: api
readinessProbe:
httpGet:
path: /healthz/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"] # Drain in-flight before SIGTERM
The preStop sleep is critical — it gives the load balancer time to deregister the pod before it starts refusing connections.
Automated rollback with GitHub Actions
- name: Deploy and verify
run: |
kubectl set image deployment/api api=$IMAGE_TAG
kubectl rollout status deployment/api --timeout=5m || {
echo "Deployment failed — rolling back"
kubectl rollout undo deployment/api
exit 1
}
- name: Run smoke tests
run: |
./scripts/smoke-test.sh $DEPLOYMENT_URL || {
echo "Smoke tests failed — rolling back"
kubectl rollout undo deployment/api
exit 1
}
Conclusion
Zero-downtime deployments require discipline across the entire stack: application code, container configuration, Kubernetes settings, and your CI/CD pipeline. Get any layer wrong and you'll have incidents.
Build the habit of running kubectl rollout status and validating smoke tests on every deploy, and you'll build the confidence to deploy 20 times a day without hesitation.