Zero-Downtime Migration from Ingress NGINX to Envoy Gateway

Why Migrate at All?

Ingress NGINX has been the default answer to “how do I expose services in Kubernetes” for years. It’s battle-tested, well-documented, and installed in hundreds of thousands of clusters. So why move?

The Kubernetes community has been clear: the Ingress resource is in maintenance mode. The Gateway API is its successor—richer, more expressive, and officially blessed by SIG Network. Envoy Gateway (now a CNCF project) is the reference implementation that pairs Envoy Proxy’s performance with Gateway API’s clean resource model.

The practical benefits are real:

Route traffic without annotations soup. Gateway API uses proper typed resources (HTTPRoute, GRPCRoute, TCPRoute) instead of controller-specific annotations that differ between ingress implementations.
Multi-team, multi-namespace routing. Gateway API separates infrastructure concerns (who owns the Gateway) from application concerns (who owns the Routes), which maps cleanly to platform and product team boundaries.
Better observability out of the box. Envoy’s built-in stats, tracing integration, and access log format are significantly richer than what NGINX exposes natively.

The migration itself is where most teams hesitate. Let’s remove that obstacle.

The Core Strategy: Parallel Running, Weighted Cutover

The key insight for zero-downtime migration is simple: run both controllers at the same time and shift traffic gradually, never cutting the rope before the new one is load-tested.

You’ll use your cloud load balancer (ALB, NLB, GCP LB, etc.) as the traffic arbiter. At no point do you change DNS records with a short TTL and hope for the best.

Phase 1: Install Envoy Gateway Alongside ingress-nginx

First, install Envoy Gateway without touching any existing routes.

helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.3.0 \
  --namespace envoy-gateway-system \
  --create-namespace

Verify the controller pod is running:

kubectl get pods -n envoy-gateway-system

Envoy Gateway will create its own GatewayClass. Confirm it’s accepted:

kubectl get gatewayclass
# NAME            CONTROLLER                      ACCEPTED
# eg              gateway.envoyproxy.io/gatewaycon  True

At this stage nothing in production has changed. Your ingress-nginx controller is still handling 100% of traffic.

Phase 2: Mirror Your Ingress Rules as HTTPRoutes

Do not delete your Ingress resources yet. Instead, create equivalent Gateway and HTTPRoute resources that mirror them.

Given an existing Ingress like:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080

Create the Gateway API equivalent:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: prod-gateway
  namespace: envoy-gateway-system
spec:
  gatewayClassName: eg
  listeners:
  - name: https
    port: 443
    protocol: HTTPS
    hostname: "api.example.com"
    tls:
      mode: Terminate
      certificateRefs:
      - name: api-tls-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-route
  namespace: default
spec:
  parentRefs:
  - name: prod-gateway
    namespace: envoy-gateway-system
  hostnames:
  - "api.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1
    backendRefs:
    - name: api-service
      port: 8080

Watch for annotation translation

This is where teams hit their first gotchas. ingress-nginx annotations have no 1:1 mapping in Gateway API—instead, Envoy Gateway uses EnvoyProxy and BackendTrafficPolicy custom resources. Common translations:

NGINX annotation	Envoy Gateway equivalent
`nginx.ingress.kubernetes.io/proxy-body-size`	`ClientTrafficPolicy` with `http.maxReceivedMessageSize`
`nginx.ingress.kubernetes.io/rate-limit`	`BackendTrafficPolicy` with `rateLimit`
`nginx.ingress.kubernetes.io/ssl-redirect`	Listener redirect filter on the `HTTPRoute`
`nginx.ingress.kubernetes.io/enable-cors`	`HTTPRoute` filter with `cors`

Take the time to audit your Ingress annotations before the cutover. Undiscovered annotations are the most common source of post-migration incidents.

Phase 3: Validate Envoy Gateway in Staging

Before shifting any production traffic, get Envoy Gateway’s load balancer IP/hostname and run your full test suite against it directly—bypassing DNS entirely.

# Get Envoy Gateway's external address
EG_LB=$(kubectl get svc -n envoy-gateway-system \
  -l gateway.envoyproxy.io/owning-gateway-name=prod-gateway \
  -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}')

# Hit it directly with the Host header
curl -H "Host: api.example.com" https://$EG_LB/v1/health --resolve "api.example.com:443:$EG_LB" -k

Run your smoke tests and load tests against $EG_LB. Fix any issues before touching the production traffic path.

Phase 4: Weighted Traffic Shift at the Load Balancer

This is the zero-downtime trick. Most cloud load balancers support weighted target groups or backend services. Use that capability to shift traffic incrementally rather than flipping a DNS record.

On AWS (ALB with weighted target groups)

# Start with 5% to Envoy Gateway
aws elbv2 modify-rule \
  --rule-arn $LISTENER_RULE_ARN \
  --actions '[{
    "Type": "forward",
    "ForwardConfig": {
      "TargetGroups": [
        {"TargetGroupArn": "$NGINX_TG_ARN", "Weight": 95},
        {"TargetGroupArn": "$EG_TG_ARN", "Weight": 5}
      ]
    }
  }]'

Monitor error rates, latency p99, and upstream health for at least 30 minutes at each step. Then progress: 5% → 20% → 50% → 100%.

If anything looks wrong, roll back instantly:

# Emergency rollback: 100% back to NGINX
aws elbv2 modify-rule \
  --rule-arn $LISTENER_RULE_ARN \
  --actions '[{"Type": "forward", "ForwardConfig": {"TargetGroups": [{"TargetGroupArn": "$NGINX_TG_ARN", "Weight": 100}]}}]'

On GCP (Backend Service weights)

Use URL Map weightedBackendServices to split traffic between the NGINX and Envoy Gateway NEGs in the same pattern.

Phase 5: Full Cutover and Cleanup

Once Envoy Gateway is at 100% and has been stable for at least 24 hours (covering at least one diurnal traffic peak), you can clean up:

Remove the NGINX target group from your load balancer rule.
Delete Ingress resources once you’ve confirmed nothing references them.
Uninstall ingress-nginx:

helm uninstall ingress-nginx -n ingress-nginx
kubectl delete namespace ingress-nginx

Remove the old load balancer (if ingress-nginx provisioned its own).

Operational Checklist Before You Start

[ ] Audit all Ingress annotations—translate each one to a Gateway API equivalent
[ ] Inventory TLS certificates and confirm they’re accessible as Kubernetes Secrets
[ ] Set up dashboards for both controllers during the transition period
[ ] Confirm your load balancer supports weighted routing
[ ] Run a chaos drill in staging: kill Envoy Gateway pods, confirm traffic falls back cleanly
[ ] Communicate a rollback window to stakeholders (even if you never use it)

What to Watch During Cutover

Keep these metrics visible on a shared screen during the cutover window:

HTTP 5xx rate — per backend, split by controller
Latency p95/p99 — Envoy Gateway’s Envoy stats give you per-route histograms out of the box
Active connections — a sudden drop signals something is wrong with keepalive handling
TLS handshake errors — certificate or cipher mismatches surface immediately under load

The Payoff

Once you’re fully on Envoy Gateway, you gain access to capabilities that would have required significant NGINX annotation gymnastics before: header-based routing, traffic mirroring, gRPC transcoding, and JWT authentication policies—all expressed as typed Kubernetes resources with proper schema validation.

The migration takes a day to prepare and a few hours to execute. The operational simplicity you gain compounds over months.