Zero-Downtime Migration from Ingress NGINX to Envoy Gateway
Why Migrate at All?
Ingress NGINX has been the default answer to “how do I expose services in Kubernetes” for years. It’s battle-tested, well-documented, and installed in hundreds of thousands of clusters. So why move?
The Kubernetes community has been clear: the Ingress resource is in maintenance mode. The Gateway API is its successor—richer, more expressive, and officially blessed by SIG Network. Envoy Gateway (now a CNCF project) is the reference implementation that pairs Envoy Proxy’s performance with Gateway API’s clean resource model.
The practical benefits are real:
- Route traffic without annotations soup. Gateway API uses proper typed resources (
HTTPRoute,GRPCRoute,TCPRoute) instead of controller-specific annotations that differ between ingress implementations. - Multi-team, multi-namespace routing. Gateway API separates infrastructure concerns (who owns the Gateway) from application concerns (who owns the Routes), which maps cleanly to platform and product team boundaries.
- Better observability out of the box. Envoy’s built-in stats, tracing integration, and access log format are significantly richer than what NGINX exposes natively.
The migration itself is where most teams hesitate. Let’s remove that obstacle.
The Core Strategy: Parallel Running, Weighted Cutover
The key insight for zero-downtime migration is simple: run both controllers at the same time and shift traffic gradually, never cutting the rope before the new one is load-tested.
You’ll use your cloud load balancer (ALB, NLB, GCP LB, etc.) as the traffic arbiter. At no point do you change DNS records with a short TTL and hope for the best.
Phase 1: Install Envoy Gateway Alongside ingress-nginx
First, install Envoy Gateway without touching any existing routes.
helm install eg oci://docker.io/envoyproxy/gateway-helm \
--version v1.3.0 \
--namespace envoy-gateway-system \
--create-namespace
Verify the controller pod is running:
kubectl get pods -n envoy-gateway-system
Envoy Gateway will create its own GatewayClass. Confirm it’s accepted:
kubectl get gatewayclass
# NAME CONTROLLER ACCEPTED
# eg gateway.envoyproxy.io/gatewaycon True
At this stage nothing in production has changed. Your ingress-nginx controller is still handling 100% of traffic.
Phase 2: Mirror Your Ingress Rules as HTTPRoutes
Do not delete your Ingress resources yet. Instead, create equivalent Gateway and HTTPRoute resources that mirror them.
Given an existing Ingress like:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: api.example.com
http:
paths:
- path: /v1
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080
Create the Gateway API equivalent:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: prod-gateway
namespace: envoy-gateway-system
spec:
gatewayClassName: eg
listeners:
- name: https
port: 443
protocol: HTTPS
hostname: "api.example.com"
tls:
mode: Terminate
certificateRefs:
- name: api-tls-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: api-route
namespace: default
spec:
parentRefs:
- name: prod-gateway
namespace: envoy-gateway-system
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1
backendRefs:
- name: api-service
port: 8080
Watch for annotation translation
This is where teams hit their first gotchas. ingress-nginx annotations have no 1:1 mapping in Gateway API—instead, Envoy Gateway uses EnvoyProxy and BackendTrafficPolicy custom resources. Common translations:
| NGINX annotation | Envoy Gateway equivalent |
|---|---|
nginx.ingress.kubernetes.io/proxy-body-size |
ClientTrafficPolicy with http.maxReceivedMessageSize |
nginx.ingress.kubernetes.io/rate-limit |
BackendTrafficPolicy with rateLimit |
nginx.ingress.kubernetes.io/ssl-redirect |
Listener redirect filter on the HTTPRoute |
nginx.ingress.kubernetes.io/enable-cors |
HTTPRoute filter with cors |
Take the time to audit your Ingress annotations before the cutover. Undiscovered annotations are the most common source of post-migration incidents.
Phase 3: Validate Envoy Gateway in Staging
Before shifting any production traffic, get Envoy Gateway’s load balancer IP/hostname and run your full test suite against it directly—bypassing DNS entirely.
# Get Envoy Gateway's external address
EG_LB=$(kubectl get svc -n envoy-gateway-system \
-l gateway.envoyproxy.io/owning-gateway-name=prod-gateway \
-o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}')
# Hit it directly with the Host header
curl -H "Host: api.example.com" https://$EG_LB/v1/health --resolve "api.example.com:443:$EG_LB" -k
Run your smoke tests and load tests against $EG_LB. Fix any issues before touching the production traffic path.
Phase 4: Weighted Traffic Shift at the Load Balancer
This is the zero-downtime trick. Most cloud load balancers support weighted target groups or backend services. Use that capability to shift traffic incrementally rather than flipping a DNS record.
On AWS (ALB with weighted target groups)
# Start with 5% to Envoy Gateway
aws elbv2 modify-rule \
--rule-arn $LISTENER_RULE_ARN \
--actions '[{
"Type": "forward",
"ForwardConfig": {
"TargetGroups": [
{"TargetGroupArn": "$NGINX_TG_ARN", "Weight": 95},
{"TargetGroupArn": "$EG_TG_ARN", "Weight": 5}
]
}
}]'
Monitor error rates, latency p99, and upstream health for at least 30 minutes at each step. Then progress: 5% → 20% → 50% → 100%.
If anything looks wrong, roll back instantly:
# Emergency rollback: 100% back to NGINX
aws elbv2 modify-rule \
--rule-arn $LISTENER_RULE_ARN \
--actions '[{"Type": "forward", "ForwardConfig": {"TargetGroups": [{"TargetGroupArn": "$NGINX_TG_ARN", "Weight": 100}]}}]'
On GCP (Backend Service weights)
Use URL Map weightedBackendServices to split traffic between the NGINX and Envoy Gateway NEGs in the same pattern.
Phase 5: Full Cutover and Cleanup
Once Envoy Gateway is at 100% and has been stable for at least 24 hours (covering at least one diurnal traffic peak), you can clean up:
- Remove the NGINX target group from your load balancer rule.
- Delete Ingress resources once you’ve confirmed nothing references them.
- Uninstall ingress-nginx:
helm uninstall ingress-nginx -n ingress-nginx
kubectl delete namespace ingress-nginx
- Remove the old load balancer (if ingress-nginx provisioned its own).
Operational Checklist Before You Start
- [ ] Audit all
Ingressannotations—translate each one to a Gateway API equivalent - [ ] Inventory TLS certificates and confirm they’re accessible as Kubernetes Secrets
- [ ] Set up dashboards for both controllers during the transition period
- [ ] Confirm your load balancer supports weighted routing
- [ ] Run a chaos drill in staging: kill Envoy Gateway pods, confirm traffic falls back cleanly
- [ ] Communicate a rollback window to stakeholders (even if you never use it)
What to Watch During Cutover
Keep these metrics visible on a shared screen during the cutover window:
- HTTP 5xx rate — per backend, split by controller
- Latency p95/p99 — Envoy Gateway’s Envoy stats give you per-route histograms out of the box
- Active connections — a sudden drop signals something is wrong with keepalive handling
- TLS handshake errors — certificate or cipher mismatches surface immediately under load
The Payoff
Once you’re fully on Envoy Gateway, you gain access to capabilities that would have required significant NGINX annotation gymnastics before: header-based routing, traffic mirroring, gRPC transcoding, and JWT authentication policies—all expressed as typed Kubernetes resources with proper schema validation.
The migration takes a day to prepare and a few hours to execute. The operational simplicity you gain compounds over months.