Zero-Downtime Migration from Ingress NGINX to Envoy Gateway

Kubernetes
Zero-Downtime Migration from Ingress NGINX to Envoy Gateway

Why Migrate at All?

Ingress NGINX has been the default answer to “how do I expose services in Kubernetes” for years. It’s battle-tested, well-documented, and installed in hundreds of thousands of clusters. So why move?

The Kubernetes community has been clear: the Ingress resource is in maintenance mode. The Gateway API is its successor—richer, more expressive, and officially blessed by SIG Network. Envoy Gateway (now a CNCF project) is the reference implementation that pairs Envoy Proxy’s performance with Gateway API’s clean resource model.

The practical benefits are real:

  • Route traffic without annotations soup. Gateway API uses proper typed resources (HTTPRoute, GRPCRoute, TCPRoute) instead of controller-specific annotations that differ between ingress implementations.
  • Multi-team, multi-namespace routing. Gateway API separates infrastructure concerns (who owns the Gateway) from application concerns (who owns the Routes), which maps cleanly to platform and product team boundaries.
  • Better observability out of the box. Envoy’s built-in stats, tracing integration, and access log format are significantly richer than what NGINX exposes natively.

The migration itself is where most teams hesitate. Let’s remove that obstacle.


The Core Strategy: Parallel Running, Weighted Cutover

The key insight for zero-downtime migration is simple: run both controllers at the same time and shift traffic gradually, never cutting the rope before the new one is load-tested.

You’ll use your cloud load balancer (ALB, NLB, GCP LB, etc.) as the traffic arbiter. At no point do you change DNS records with a short TTL and hope for the best.


Phase 1: Install Envoy Gateway Alongside ingress-nginx

First, install Envoy Gateway without touching any existing routes.

helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.3.0 \
  --namespace envoy-gateway-system \
  --create-namespace

Verify the controller pod is running:

kubectl get pods -n envoy-gateway-system

Envoy Gateway will create its own GatewayClass. Confirm it’s accepted:

kubectl get gatewayclass
# NAME            CONTROLLER                      ACCEPTED
# eg              gateway.envoyproxy.io/gatewaycon  True

At this stage nothing in production has changed. Your ingress-nginx controller is still handling 100% of traffic.


Phase 2: Mirror Your Ingress Rules as HTTPRoutes

Do not delete your Ingress resources yet. Instead, create equivalent Gateway and HTTPRoute resources that mirror them.

Given an existing Ingress like:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080

Create the Gateway API equivalent:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: prod-gateway
  namespace: envoy-gateway-system
spec:
  gatewayClassName: eg
  listeners:
  - name: https
    port: 443
    protocol: HTTPS
    hostname: "api.example.com"
    tls:
      mode: Terminate
      certificateRefs:
      - name: api-tls-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-route
  namespace: default
spec:
  parentRefs:
  - name: prod-gateway
    namespace: envoy-gateway-system
  hostnames:
  - "api.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1
    backendRefs:
    - name: api-service
      port: 8080

Watch for annotation translation

This is where teams hit their first gotchas. ingress-nginx annotations have no 1:1 mapping in Gateway API—instead, Envoy Gateway uses EnvoyProxy and BackendTrafficPolicy custom resources. Common translations:

NGINX annotation Envoy Gateway equivalent
nginx.ingress.kubernetes.io/proxy-body-size ClientTrafficPolicy with http.maxReceivedMessageSize
nginx.ingress.kubernetes.io/rate-limit BackendTrafficPolicy with rateLimit
nginx.ingress.kubernetes.io/ssl-redirect Listener redirect filter on the HTTPRoute
nginx.ingress.kubernetes.io/enable-cors HTTPRoute filter with cors

Take the time to audit your Ingress annotations before the cutover. Undiscovered annotations are the most common source of post-migration incidents.


Phase 3: Validate Envoy Gateway in Staging

Before shifting any production traffic, get Envoy Gateway’s load balancer IP/hostname and run your full test suite against it directly—bypassing DNS entirely.

# Get Envoy Gateway's external address
EG_LB=$(kubectl get svc -n envoy-gateway-system \
  -l gateway.envoyproxy.io/owning-gateway-name=prod-gateway \
  -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}')

# Hit it directly with the Host header
curl -H "Host: api.example.com" https://$EG_LB/v1/health --resolve "api.example.com:443:$EG_LB" -k

Run your smoke tests and load tests against $EG_LB. Fix any issues before touching the production traffic path.


Phase 4: Weighted Traffic Shift at the Load Balancer

This is the zero-downtime trick. Most cloud load balancers support weighted target groups or backend services. Use that capability to shift traffic incrementally rather than flipping a DNS record.

On AWS (ALB with weighted target groups)

# Start with 5% to Envoy Gateway
aws elbv2 modify-rule \
  --rule-arn $LISTENER_RULE_ARN \
  --actions '[{
    "Type": "forward",
    "ForwardConfig": {
      "TargetGroups": [
        {"TargetGroupArn": "$NGINX_TG_ARN", "Weight": 95},
        {"TargetGroupArn": "$EG_TG_ARN", "Weight": 5}
      ]
    }
  }]'

Monitor error rates, latency p99, and upstream health for at least 30 minutes at each step. Then progress: 5% → 20% → 50% → 100%.

If anything looks wrong, roll back instantly:

# Emergency rollback: 100% back to NGINX
aws elbv2 modify-rule \
  --rule-arn $LISTENER_RULE_ARN \
  --actions '[{"Type": "forward", "ForwardConfig": {"TargetGroups": [{"TargetGroupArn": "$NGINX_TG_ARN", "Weight": 100}]}}]'

On GCP (Backend Service weights)

Use URL Map weightedBackendServices to split traffic between the NGINX and Envoy Gateway NEGs in the same pattern.


Phase 5: Full Cutover and Cleanup

Once Envoy Gateway is at 100% and has been stable for at least 24 hours (covering at least one diurnal traffic peak), you can clean up:

  1. Remove the NGINX target group from your load balancer rule.
  2. Delete Ingress resources once you’ve confirmed nothing references them.
  3. Uninstall ingress-nginx:
helm uninstall ingress-nginx -n ingress-nginx
kubectl delete namespace ingress-nginx
  1. Remove the old load balancer (if ingress-nginx provisioned its own).

Operational Checklist Before You Start

  • [ ] Audit all Ingress annotations—translate each one to a Gateway API equivalent
  • [ ] Inventory TLS certificates and confirm they’re accessible as Kubernetes Secrets
  • [ ] Set up dashboards for both controllers during the transition period
  • [ ] Confirm your load balancer supports weighted routing
  • [ ] Run a chaos drill in staging: kill Envoy Gateway pods, confirm traffic falls back cleanly
  • [ ] Communicate a rollback window to stakeholders (even if you never use it)

What to Watch During Cutover

Keep these metrics visible on a shared screen during the cutover window:

  • HTTP 5xx rate — per backend, split by controller
  • Latency p95/p99 — Envoy Gateway’s Envoy stats give you per-route histograms out of the box
  • Active connections — a sudden drop signals something is wrong with keepalive handling
  • TLS handshake errors — certificate or cipher mismatches surface immediately under load

The Payoff

Once you’re fully on Envoy Gateway, you gain access to capabilities that would have required significant NGINX annotation gymnastics before: header-based routing, traffic mirroring, gRPC transcoding, and JWT authentication policies—all expressed as typed Kubernetes resources with proper schema validation.

The migration takes a day to prepare and a few hours to execute. The operational simplicity you gain compounds over months.

Sources