Skip to content

Locality-aware routing

Reduce cross-zone traffic costs and latency with topology-aware routing, and fail over to other localities when local endpoints are unavailable.

Reduce cross-zone traffic latency by routing requests to nearby endpoints, with automatic failover to other localities when local endpoints are unavailable.

About

Locality-aware routing (also called topology-aware routing) sends requests to backend endpoints that share locality with the gateway proxy, such as endpoints in the same zone, region, or node. Agentgateway groups endpoints into priority buckets based on their locality relative to the gateway, then selects the best bucket on each request.

Locality applies to all backend services, not just LLM providers. The same priority-group selection that powers LLM failover handles general HTTP routing as well.

How locality bucketing works

When you enable locality-aware routing for a Service, agentgateway ranks each endpoint against the gateway’s own locality. The ranking forms ordered priority buckets, with closer matches in higher-priority buckets.

  1. Same zone as the gateway, the highest priority.
  2. Same region, different zone, the second priority.
  3. Different region, the fallback.

In failover mode (the default when you set trafficDistribution on a Service), the gateway sends requests to the highest-priority bucket that has at least one healthy endpoint. If all endpoints in that bucket are unhealthy or removed, traffic spills over to the next bucket. This way, you get locality preference without sacrificing availability.

Failover vs. strict locality

Two enforcement levels are available.

  • Failover (default): Prefer local endpoints, but fail over to other localities when no local endpoints are available. Use failover for cost and latency optimization without sacrificing availability.
  • Strict: Only deliver to endpoints that match the configured locality. If no matching endpoints exist, requests return 503 Service Unavailable instead of spilling over. Use strict mode when locality is a hard requirement, such as data residency or same-node co-location.

You configure both modes through standard Kubernetes Service fields, not through agentgateway-specific resources.

BehaviorService fieldValue
Failover, prefer same zonespec.trafficDistributionPreferClose
Strict, same node onlyspec.internalTrafficPolicyLocal

How the gateway determines its own locality

For locality-aware routing to work, the gateway proxy must know its own locality. Agentgateway resolves this in the following order.

  1. The LOCALITY environment variable on the proxy pod (region/zone/subzone format), if set.
  2. The labels on the node where the proxy pod runs, topology.kubernetes.io/region and topology.kubernetes.io/zone.

If neither source provides locality information, locality preferences on Services are silently ignored. Every endpoint falls into the highest-priority bucket, and traffic is distributed without locality awareness.

Before you begin

  1. Follow the Get started guide to install agentgateway.

  2. Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.

  3. Get the external address of the gateway and save it in an environment variable.

    export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system http -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
    echo $INGRESS_GW_ADDRESS  

  1. Install the Istio CRDs that agentgateway consumes for workload and locality discovery. Use the manifest from a recent Istio release.

    kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.27/manifests/charts/base/files/crd-all.gen.yaml
  2. Verify that the nodes in your cluster carry locality labels. Cloud-provider Kubernetes distributions add these labels automatically, but local clusters such as kind do not.

    kubectl get nodes --label-columns=topology.kubernetes.io/region,topology.kubernetes.io/zone

    If the REGION and ZONE columns are empty, label your nodes manually. The values that you choose determine which endpoints count as “same zone” or “same region” as the gateway. For a single-node test cluster, run the following command.

    kubectl label node <node-name> topology.kubernetes.io/region=region topology.kubernetes.io/zone=zone --overwrite

    Restart the agentgateway controller so it picks up the updated node labels.

    kubectl rollout restart deployment/agentgateway -n agentgateway-system

Set up failover across localities

Deploy three backend instances that represent three localities, and then enable PreferClose on the Service so that the gateway prefers same-zone endpoints and falls back to other zones or regions only when needed.

The example uses Istio WorkloadEntry resources to override locality on each backend. WorkloadEntries are required for single-node clusters such as kind, where every pod runs on the same node and shares one locality. In a real multi-zone cluster, you do not need WorkloadEntries, because each pod inherits locality from the node where it runs, and a Service selector that matches pod labels works as usual.
  1. Create a namespace and a Gateway.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Namespace
    metadata:
      name: agentgateway-locality
    ---
    apiVersion: gateway.networking.k8s.io/v1
    kind: Gateway
    metadata:
      name: gateway
      namespace: agentgateway-locality
    spec:
      gatewayClassName: agentgateway
      listeners:
        - name: http
          protocol: HTTP
          port: 80
          allowedRoutes:
            namespaces:
              from: Same
    EOF
  2. Deploy three backend instances. Each instance returns its own pod hostname so you can identify which backend served a request.

    kubectl apply -f- <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: backend-zone-a
      namespace: agentgateway-locality
    spec:
      replicas: 1
      selector:
        matchLabels:
          app.kubernetes.io/name: backend-zone-a
      template:
        metadata:
          labels:
            app: backend-zone-a
            app.kubernetes.io/name: backend-zone-a
        spec:
          containers:
            - name: agnhost
              image: registry.k8s.io/e2e-test-images/agnhost:2.45
              args: ["netexec", "--http-port=80"]
              ports:
                - name: http
                  containerPort: 80
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: backend-zone-b
      namespace: agentgateway-locality
    spec:
      replicas: 1
      selector:
        matchLabels:
          app.kubernetes.io/name: backend-zone-b
      template:
        metadata:
          labels:
            app: backend-zone-b
            app.kubernetes.io/name: backend-zone-b
        spec:
          containers:
            - name: agnhost
              image: registry.k8s.io/e2e-test-images/agnhost:2.45
              args: ["netexec", "--http-port=80"]
              ports:
                - name: http
                  containerPort: 80
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: backend-region-b
      namespace: agentgateway-locality
    spec:
      replicas: 1
      selector:
        matchLabels:
          app.kubernetes.io/name: backend-region-b
      template:
        metadata:
          labels:
            app: backend-region-b
            app.kubernetes.io/name: backend-region-b
        spec:
          containers:
            - name: agnhost
              image: registry.k8s.io/e2e-test-images/agnhost:2.45
              args: ["netexec", "--http-port=80"]
              ports:
                - name: http
                  containerPort: 80
    EOF
  3. Create a Service and an HTTPRoute. The Service selector matches a label that the WorkloadEntries in the next step carry, not the pod labels.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Service
    metadata:
      name: locality-svc
      namespace: agentgateway-locality
    spec:
      selector:
        app: locality-svc-workloadentry
      ports:
        - name: http
          port: 80
          targetPort: 80
          protocol: TCP
    ---
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: locality-route
      namespace: agentgateway-locality
    spec:
      parentRefs:
        - name: gateway
      hostnames:
        - locality.test
      rules:
        - backendRefs:
            - name: locality-svc
              port: 80
    EOF
  4. Capture each backend pod’s IP address and create a WorkloadEntry that overrides its locality. The labels on each WorkloadEntry match the Service selector, so agentgateway treats them as endpoints of locality-svc.

    ZONE_A_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-a -o jsonpath='{.items[0].status.podIP}')
    ZONE_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-b -o jsonpath='{.items[0].status.podIP}')
    REGION_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-region-b -o jsonpath='{.items[0].status.podIP}')
    
    kubectl apply -f- <<EOF
    apiVersion: networking.istio.io/v1
    kind: WorkloadEntry
    metadata:
      name: we-zone-a
      namespace: agentgateway-locality
      labels:
        app: locality-svc-workloadentry
    spec:
      address: ${ZONE_A_IP}
      locality: "region/zone"
      ports:
        http: 80
    ---
    apiVersion: networking.istio.io/v1
    kind: WorkloadEntry
    metadata:
      name: we-zone-b
      namespace: agentgateway-locality
      labels:
        app: locality-svc-workloadentry
    spec:
      address: ${ZONE_B_IP}
      locality: "region/other-zone"
      ports:
        http: 80
    ---
    apiVersion: networking.istio.io/v1
    kind: WorkloadEntry
    metadata:
      name: we-region-b
      namespace: agentgateway-locality
      labels:
        app: locality-svc-workloadentry
    spec:
      address: ${REGION_B_IP}
      locality: "other-region/zone"
      ports:
        http: 80
    EOF
  5. Get the gateway address.

    export INGRESS_GW_ADDRESS=$(kubectl get gateway gateway -n agentgateway-locality -o jsonpath='{.status.addresses[0].value}')
    echo $INGRESS_GW_ADDRESS
  6. Send a few baseline requests. Without trafficDistribution set, traffic spreads across all three backends.

    for i in $(seq 1 10); do
      curl -s -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
      echo
    done

    Example output:

    backend-zone-b-6bddfdcd85-ht8qn
    backend-region-b-5d46cfc8b5-xmfnc
    backend-zone-a-868fdff56f-w9jsn
    backend-region-b-5d46cfc8b5-xmfnc
    backend-region-b-5d46cfc8b5-xmfnc
    backend-region-b-5d46cfc8b5-xmfnc
    backend-zone-a-868fdff56f-w9jsn
    backend-region-b-5d46cfc8b5-xmfnc
    backend-region-b-5d46cfc8b5-xmfnc
    backend-zone-a-868fdff56f-w9jsn
  7. Enable locality-aware failover by setting trafficDistribution: PreferClose on the Service.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Service
    metadata:
      name: locality-svc
      namespace: agentgateway-locality
    spec:
      selector:
        app: locality-svc-workloadentry
      ports:
        - name: http
          port: 80
          targetPort: 80
          protocol: TCP
      trafficDistribution: PreferClose
    EOF
  8. Send requests again. All requests now go to backend-zone-a, the only backend in the same zone as the gateway.

    for i in $(seq 1 20); do
      curl -s -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
      echo
    done | sort | uniq -c

    Example output:

    20 backend-zone-a-868fdff56f-w9jsn
  9. Simulate a same-zone outage by deleting the same-zone WorkloadEntry. Traffic spills over to the next bucket, which is the same region but a different zone.

    kubectl delete workloadentry we-zone-a -n agentgateway-locality
    sleep 2
    
    for i in $(seq 1 20); do
      curl -s --max-time 5 -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
      echo
    done | sort | uniq -c

    Example output:

    20 backend-zone-b-6bddfdcd85-ht8qn
  10. Delete the same-region WorkloadEntry. Traffic spills over to the cross-region backend.

    kubectl delete workloadentry we-zone-b -n agentgateway-locality
    sleep 2
    
    for i in $(seq 1 20); do
      curl -s --max-time 5 -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
      echo
    done | sort | uniq -c

    Example output:

20 backend-region-b-5d46cfc8b5-xmfnc ```

Set up strict same-node routing

Use internalTrafficPolicy: Local to require that requests reach an endpoint on the same node as the gateway. Unlike trafficDistribution, strict locality does not spill over. When no local endpoints exist, requests return 503 Service Unavailable.

  1. Restore the same-zone and same-region WorkloadEntries that you deleted in the previous task.

    ZONE_A_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-a -o jsonpath='{.items[0].status.podIP}')
    ZONE_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-b -o jsonpath='{.items[0].status.podIP}')
    REGION_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-region-b -o jsonpath='{.items[0].status.podIP}')
    
    kubectl apply -f- <<EOF
    apiVersion: networking.istio.io/v1
    kind: WorkloadEntry
    metadata:
      name: we-zone-a
      namespace: agentgateway-locality
      labels:
        app: locality-svc-workloadentry
    spec:
      address: ${ZONE_A_IP}
      locality: "region/zone"
      ports:
        http: 80
    ---
    apiVersion: networking.istio.io/v1
    kind: WorkloadEntry
    metadata:
      name: we-zone-b
      namespace: agentgateway-locality
      labels:
        app: locality-svc-workloadentry
    spec:
      address: ${ZONE_B_IP}
      locality: "region/other-zone"
      ports:
        http: 80
    ---
    apiVersion: networking.istio.io/v1
    kind: WorkloadEntry
    metadata:
      name: we-region-b
      namespace: agentgateway-locality
      labels:
        app: locality-svc-workloadentry
    spec:
      address: ${REGION_B_IP}
      locality: "other-region/zone"
      ports:
        http: 80
    EOF
  2. Switch the Service from trafficDistribution to internalTrafficPolicy: Local. The example uses WorkloadEntries with no node association, so no endpoints are eligible for local-only delivery.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Service
    metadata:
      name: locality-svc
      namespace: agentgateway-locality
    spec:
      selector:
        app: locality-svc-workloadentry
      ports:
        - name: http
          port: 80
          targetPort: 80
          protocol: TCP
      internalTrafficPolicy: Local
    EOF
  3. Send requests and observe that every request returns 503.

    for i in $(seq 1 10); do
      curl -s -o /dev/null -w "%{http_code}\n" -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
    done | sort | uniq -c

    Example output:

      10 503

    In a multi-node cluster, replace the WorkloadEntries with pod-backed endpoints on the same node as the gateway to see successful responses.

Cleanup

You can remove the resources that you created in this guide.
kubectl delete namespace agentgateway-locality

Next steps

  • Combine locality-aware routing with traffic splitting to weight traffic across backends within each locality bucket.
  • For LLM provider routing, see Failover across LLM providers, which uses the same priority-bucket model with a CEL-based health policy.
Was this page helpful?
Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.