Locality-aware routing
Reduce cross-zone traffic costs and latency with topology-aware routing, and fail over to other localities when local endpoints are unavailable.
Reduce cross-zone traffic latency by routing requests to nearby endpoints, with automatic failover to other localities when local endpoints are unavailable.
About
Locality-aware routing (also called topology-aware routing) sends requests to backend endpoints that share locality with the gateway proxy, such as endpoints in the same zone, region, or node. Agentgateway groups endpoints into priority buckets based on their locality relative to the gateway, then selects the best bucket on each request.
Locality applies to all backend services, not just LLM providers. The same priority-group selection that powers LLM failover handles general HTTP routing as well.
How locality bucketing works
When you enable locality-aware routing for a Service, agentgateway ranks each endpoint against the gateway’s own locality. The ranking forms ordered priority buckets, with closer matches in higher-priority buckets.
- Same zone as the gateway, the highest priority.
- Same region, different zone, the second priority.
- Different region, the fallback.
In failover mode (the default when you set trafficDistribution on a Service), the gateway sends requests to the highest-priority bucket that has at least one healthy endpoint. If all endpoints in that bucket are unhealthy or removed, traffic spills over to the next bucket. This way, you get locality preference without sacrificing availability.
Failover vs. strict locality
Two enforcement levels are available.
- Failover (default): Prefer local endpoints, but fail over to other localities when no local endpoints are available. Use failover for cost and latency optimization without sacrificing availability.
- Strict: Only deliver to endpoints that match the configured locality. If no matching endpoints exist, requests return
503 Service Unavailableinstead of spilling over. Use strict mode when locality is a hard requirement, such as data residency or same-node co-location.
You configure both modes through standard Kubernetes Service fields, not through agentgateway-specific resources.
| Behavior | Service field | Value |
|---|---|---|
| Failover, prefer same zone | spec.trafficDistribution | PreferClose |
| Strict, same node only | spec.internalTrafficPolicy | Local |
How the gateway determines its own locality
For locality-aware routing to work, the gateway proxy must know its own locality. Agentgateway resolves this in the following order.
- The
LOCALITYenvironment variable on the proxy pod (region/zone/subzoneformat), if set. - The labels on the node where the proxy pod runs,
topology.kubernetes.io/regionandtopology.kubernetes.io/zone.
If neither source provides locality information, locality preferences on Services are silently ignored. Every endpoint falls into the highest-priority bucket, and traffic is distributed without locality awareness.
Before you begin
Follow the Get started guide to install agentgateway.
Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.
Get the external address of the gateway and save it in an environment variable.
export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system http -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}") echo $INGRESS_GW_ADDRESS
Install the Istio CRDs that agentgateway consumes for workload and locality discovery. Use the manifest from a recent Istio release.
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.27/manifests/charts/base/files/crd-all.gen.yamlVerify that the nodes in your cluster carry locality labels. Cloud-provider Kubernetes distributions add these labels automatically, but local clusters such as kind do not.
kubectl get nodes --label-columns=topology.kubernetes.io/region,topology.kubernetes.io/zoneIf the
REGIONandZONEcolumns are empty, label your nodes manually. The values that you choose determine which endpoints count as “same zone” or “same region” as the gateway. For a single-node test cluster, run the following command.kubectl label node <node-name> topology.kubernetes.io/region=region topology.kubernetes.io/zone=zone --overwriteRestart the agentgateway controller so it picks up the updated node labels.
kubectl rollout restart deployment/agentgateway -n agentgateway-system
Set up failover across localities
Deploy three backend instances that represent three localities, and then enable PreferClose on the Service so that the gateway prefers same-zone endpoints and falls back to other zones or regions only when needed.
WorkloadEntry resources to override locality on each backend. WorkloadEntries are required for single-node clusters such as kind, where every pod runs on the same node and shares one locality. In a real multi-zone cluster, you do not need WorkloadEntries, because each pod inherits locality from the node where it runs, and a Service selector that matches pod labels works as usual.Create a namespace and a Gateway.
kubectl apply -f- <<EOF apiVersion: v1 kind: Namespace metadata: name: agentgateway-locality --- apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: gateway namespace: agentgateway-locality spec: gatewayClassName: agentgateway listeners: - name: http protocol: HTTP port: 80 allowedRoutes: namespaces: from: Same EOFDeploy three backend instances. Each instance returns its own pod hostname so you can identify which backend served a request.
kubectl apply -f- <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: backend-zone-a namespace: agentgateway-locality spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: backend-zone-a template: metadata: labels: app: backend-zone-a app.kubernetes.io/name: backend-zone-a spec: containers: - name: agnhost image: registry.k8s.io/e2e-test-images/agnhost:2.45 args: ["netexec", "--http-port=80"] ports: - name: http containerPort: 80 --- apiVersion: apps/v1 kind: Deployment metadata: name: backend-zone-b namespace: agentgateway-locality spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: backend-zone-b template: metadata: labels: app: backend-zone-b app.kubernetes.io/name: backend-zone-b spec: containers: - name: agnhost image: registry.k8s.io/e2e-test-images/agnhost:2.45 args: ["netexec", "--http-port=80"] ports: - name: http containerPort: 80 --- apiVersion: apps/v1 kind: Deployment metadata: name: backend-region-b namespace: agentgateway-locality spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: backend-region-b template: metadata: labels: app: backend-region-b app.kubernetes.io/name: backend-region-b spec: containers: - name: agnhost image: registry.k8s.io/e2e-test-images/agnhost:2.45 args: ["netexec", "--http-port=80"] ports: - name: http containerPort: 80 EOFCreate a Service and an HTTPRoute. The Service selector matches a label that the WorkloadEntries in the next step carry, not the pod labels.
kubectl apply -f- <<EOF apiVersion: v1 kind: Service metadata: name: locality-svc namespace: agentgateway-locality spec: selector: app: locality-svc-workloadentry ports: - name: http port: 80 targetPort: 80 protocol: TCP --- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: locality-route namespace: agentgateway-locality spec: parentRefs: - name: gateway hostnames: - locality.test rules: - backendRefs: - name: locality-svc port: 80 EOFCapture each backend pod’s IP address and create a WorkloadEntry that overrides its locality. The labels on each WorkloadEntry match the Service selector, so agentgateway treats them as endpoints of
locality-svc.ZONE_A_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-a -o jsonpath='{.items[0].status.podIP}') ZONE_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-b -o jsonpath='{.items[0].status.podIP}') REGION_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-region-b -o jsonpath='{.items[0].status.podIP}') kubectl apply -f- <<EOF apiVersion: networking.istio.io/v1 kind: WorkloadEntry metadata: name: we-zone-a namespace: agentgateway-locality labels: app: locality-svc-workloadentry spec: address: ${ZONE_A_IP} locality: "region/zone" ports: http: 80 --- apiVersion: networking.istio.io/v1 kind: WorkloadEntry metadata: name: we-zone-b namespace: agentgateway-locality labels: app: locality-svc-workloadentry spec: address: ${ZONE_B_IP} locality: "region/other-zone" ports: http: 80 --- apiVersion: networking.istio.io/v1 kind: WorkloadEntry metadata: name: we-region-b namespace: agentgateway-locality labels: app: locality-svc-workloadentry spec: address: ${REGION_B_IP} locality: "other-region/zone" ports: http: 80 EOFGet the gateway address.
export INGRESS_GW_ADDRESS=$(kubectl get gateway gateway -n agentgateway-locality -o jsonpath='{.status.addresses[0].value}') echo $INGRESS_GW_ADDRESSSend a few baseline requests. Without
trafficDistributionset, traffic spreads across all three backends.for i in $(seq 1 10); do curl -s -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname" echo doneExample output:
backend-zone-b-6bddfdcd85-ht8qn backend-region-b-5d46cfc8b5-xmfnc backend-zone-a-868fdff56f-w9jsn backend-region-b-5d46cfc8b5-xmfnc backend-region-b-5d46cfc8b5-xmfnc backend-region-b-5d46cfc8b5-xmfnc backend-zone-a-868fdff56f-w9jsn backend-region-b-5d46cfc8b5-xmfnc backend-region-b-5d46cfc8b5-xmfnc backend-zone-a-868fdff56f-w9jsnEnable locality-aware failover by setting
trafficDistribution: PreferCloseon the Service.kubectl apply -f- <<EOF apiVersion: v1 kind: Service metadata: name: locality-svc namespace: agentgateway-locality spec: selector: app: locality-svc-workloadentry ports: - name: http port: 80 targetPort: 80 protocol: TCP trafficDistribution: PreferClose EOFSend requests again. All requests now go to
backend-zone-a, the only backend in the same zone as the gateway.for i in $(seq 1 20); do curl -s -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname" echo done | sort | uniq -cExample output:
20 backend-zone-a-868fdff56f-w9jsnSimulate a same-zone outage by deleting the same-zone WorkloadEntry. Traffic spills over to the next bucket, which is the same region but a different zone.
kubectl delete workloadentry we-zone-a -n agentgateway-locality sleep 2 for i in $(seq 1 20); do curl -s --max-time 5 -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname" echo done | sort | uniq -cExample output:
20 backend-zone-b-6bddfdcd85-ht8qnDelete the same-region WorkloadEntry. Traffic spills over to the cross-region backend.
kubectl delete workloadentry we-zone-b -n agentgateway-locality sleep 2 for i in $(seq 1 20); do curl -s --max-time 5 -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname" echo done | sort | uniq -cExample output:
20 backend-region-b-5d46cfc8b5-xmfnc ```
Set up strict same-node routing
Use internalTrafficPolicy: Local to require that requests reach an endpoint on the same node as the gateway. Unlike trafficDistribution, strict locality does not spill over. When no local endpoints exist, requests return 503 Service Unavailable.
Restore the same-zone and same-region WorkloadEntries that you deleted in the previous task.
ZONE_A_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-a -o jsonpath='{.items[0].status.podIP}') ZONE_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-b -o jsonpath='{.items[0].status.podIP}') REGION_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-region-b -o jsonpath='{.items[0].status.podIP}') kubectl apply -f- <<EOF apiVersion: networking.istio.io/v1 kind: WorkloadEntry metadata: name: we-zone-a namespace: agentgateway-locality labels: app: locality-svc-workloadentry spec: address: ${ZONE_A_IP} locality: "region/zone" ports: http: 80 --- apiVersion: networking.istio.io/v1 kind: WorkloadEntry metadata: name: we-zone-b namespace: agentgateway-locality labels: app: locality-svc-workloadentry spec: address: ${ZONE_B_IP} locality: "region/other-zone" ports: http: 80 --- apiVersion: networking.istio.io/v1 kind: WorkloadEntry metadata: name: we-region-b namespace: agentgateway-locality labels: app: locality-svc-workloadentry spec: address: ${REGION_B_IP} locality: "other-region/zone" ports: http: 80 EOFSwitch the Service from
trafficDistributiontointernalTrafficPolicy: Local. The example uses WorkloadEntries with no node association, so no endpoints are eligible for local-only delivery.kubectl apply -f- <<EOF apiVersion: v1 kind: Service metadata: name: locality-svc namespace: agentgateway-locality spec: selector: app: locality-svc-workloadentry ports: - name: http port: 80 targetPort: 80 protocol: TCP internalTrafficPolicy: Local EOFSend requests and observe that every request returns
503.for i in $(seq 1 10); do curl -s -o /dev/null -w "%{http_code}\n" -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname" done | sort | uniq -cExample output:
10 503In a multi-node cluster, replace the WorkloadEntries with pod-backed endpoints on the same node as the gateway to see successful responses.
Cleanup
You can remove the resources that you created in this guide.kubectl delete namespace agentgateway-localityNext steps
- Combine locality-aware routing with traffic splitting to weight traffic across backends within each locality bucket.
- For LLM provider routing, see Failover across LLM providers, which uses the same priority-bucket model with a CEL-based health policy.