Locality-aware routing

Reduce cross-zone traffic costs and latency with topology-aware routing, and fail over to other localities when local endpoints are unavailable.

Reduce cross-zone traffic latency by routing requests to nearby endpoints, with automatic failover to other localities when local endpoints are unavailable.

About

Locality-aware routing (also called topology-aware routing) sends requests to backend endpoints that share locality with the gateway proxy, such as endpoints in the same zone, region, or node. Agentgateway groups endpoints into priority buckets based on their locality relative to the gateway, then selects the best bucket on each request.

Locality applies to all backend services, not just LLM providers. The same priority-group selection that powers LLM failover handles general HTTP routing as well.

How locality bucketing works

When you enable locality-aware routing for a Service, agentgateway ranks each endpoint against the gateway’s own locality. The ranking forms ordered priority buckets, with closer matches in higher-priority buckets.

Same zone as the gateway, the highest priority.
Same region, different zone, the second priority.
Different region, the fallback.

In failover mode (the default when you set trafficDistribution on a Service), the gateway sends requests to the highest-priority bucket that has at least one healthy endpoint. If all endpoints in that bucket are unhealthy or removed, traffic spills over to the next bucket. This way, you get locality preference without sacrificing availability.

Failover vs. strict locality

Two enforcement levels are available.

Failover (default): Prefer local endpoints, but fail over to other localities when no local endpoints are available. Use failover for cost and latency optimization without sacrificing availability.
Strict: Only deliver to endpoints that match the configured locality. If no matching endpoints exist, requests return 503 Service Unavailable instead of spilling over. Use strict mode when locality is a hard requirement, such as data residency or same-node co-location.

You configure both modes through standard Kubernetes Service fields, not through agentgateway-specific resources.

Behavior	Service field	Value
Failover, prefer same zone	`spec.trafficDistribution`	`PreferClose`
Strict, same node only	`spec.internalTrafficPolicy`	`Local`

How the gateway determines its own locality

For locality-aware routing to work, the gateway proxy must know its own locality. Agentgateway resolves this in the following order.

The LOCALITY environment variable on the proxy pod (region/zone/subzone format), if set.
The labels on the node where the proxy pod runs, topology.kubernetes.io/region and topology.kubernetes.io/zone.

If neither source provides locality information, locality preferences on Services are silently ignored. Every endpoint falls into the highest-priority bucket, and traffic is distributed without locality awareness.

Before you begin

Follow the Get started guide to install agentgateway.
Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.

Get the external address of the gateway and save it in an environment variable.

export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system http -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
echo $INGRESS_GW_ADDRESS

Install the Istio CRDs that agentgateway consumes for workload and locality discovery. Use the manifest from a recent Istio release.
```
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.27/manifests/charts/base/files/crd-all.gen.yaml
```
Verify that the nodes in your cluster carry locality labels. Cloud-provider Kubernetes distributions add these labels automatically, but local clusters such as kind do not.
```
kubectl get nodes --label-columns=topology.kubernetes.io/region,topology.kubernetes.io/zone
```
If the REGION and ZONE columns are empty, label your nodes manually. The values that you choose determine which endpoints count as “same zone” or “same region” as the gateway. For a single-node test cluster, run the following command.
```
kubectl label node <node-name> topology.kubernetes.io/region=region topology.kubernetes.io/zone=zone --overwrite
```
Restart the agentgateway controller so it picks up the updated node labels.
```
kubectl rollout restart deployment/agentgateway -n agentgateway-system
```

Set up failover across localities

Deploy three backend instances that represent three localities, and then enable PreferClose on the Service so that the gateway prefers same-zone endpoints and falls back to other zones or regions only when needed.

The example uses Istio WorkloadEntry resources to override locality on each backend. WorkloadEntries are required for single-node clusters such as kind, where every pod runs on the same node and shares one locality. In a real multi-zone cluster, you do not need WorkloadEntries, because each pod inherits locality from the node where it runs, and a Service selector that matches pod labels works as usual.

Create a namespace and a Gateway.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: agentgateway-locality
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: gateway
  namespace: agentgateway-locality
spec:
  gatewayClassName: agentgateway
  listeners:
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: Same
EOF

Deploy three backend instances. Each instance returns its own pod hostname so you can identify which backend served a request.

kubectl apply -f- <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-zone-a
  namespace: agentgateway-locality
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: backend-zone-a
  template:
    metadata:
      labels:
        app: backend-zone-a
        app.kubernetes.io/name: backend-zone-a
    spec:
      containers:
        - name: agnhost
          image: registry.k8s.io/e2e-test-images/agnhost:2.45
          args: ["netexec", "--http-port=80"]
          ports:
            - name: http
              containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-zone-b
  namespace: agentgateway-locality
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: backend-zone-b
  template:
    metadata:
      labels:
        app: backend-zone-b
        app.kubernetes.io/name: backend-zone-b
    spec:
      containers:
        - name: agnhost
          image: registry.k8s.io/e2e-test-images/agnhost:2.45
          args: ["netexec", "--http-port=80"]
          ports:
            - name: http
              containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-region-b
  namespace: agentgateway-locality
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: backend-region-b
  template:
    metadata:
      labels:
        app: backend-region-b
        app.kubernetes.io/name: backend-region-b
    spec:
      containers:
        - name: agnhost
          image: registry.k8s.io/e2e-test-images/agnhost:2.45
          args: ["netexec", "--http-port=80"]
          ports:
            - name: http
              containerPort: 80
EOF

Create a Service and an HTTPRoute. The Service selector matches a label that the WorkloadEntries in the next step carry, not the pod labels.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Service
metadata:
  name: locality-svc
  namespace: agentgateway-locality
spec:
  selector:
    app: locality-svc-workloadentry
  ports:
    - name: http
      port: 80
      targetPort: 80
      protocol: TCP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: locality-route
  namespace: agentgateway-locality
spec:
  parentRefs:
    - name: gateway
  hostnames:
    - locality.test
  rules:
    - backendRefs:
        - name: locality-svc
          port: 80
EOF

Capture each backend pod’s IP address and create a WorkloadEntry that overrides its locality. The labels on each WorkloadEntry match the Service selector, so agentgateway treats them as endpoints of locality-svc.

ZONE_A_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-a -o jsonpath='{.items[0].status.podIP}')
ZONE_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-b -o jsonpath='{.items[0].status.podIP}')
REGION_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-region-b -o jsonpath='{.items[0].status.podIP}')

kubectl apply -f- <<EOF
apiVersion: networking.istio.io/v1
kind: WorkloadEntry
metadata:
  name: we-zone-a
  namespace: agentgateway-locality
  labels:
    app: locality-svc-workloadentry
spec:
  address: ${ZONE_A_IP}
  locality: "region/zone"
  ports:
    http: 80
---
apiVersion: networking.istio.io/v1
kind: WorkloadEntry
metadata:
  name: we-zone-b
  namespace: agentgateway-locality
  labels:
    app: locality-svc-workloadentry
spec:
  address: ${ZONE_B_IP}
  locality: "region/other-zone"
  ports:
    http: 80
---
apiVersion: networking.istio.io/v1
kind: WorkloadEntry
metadata:
  name: we-region-b
  namespace: agentgateway-locality
  labels:
    app: locality-svc-workloadentry
spec:
  address: ${REGION_B_IP}
  locality: "other-region/zone"
  ports:
    http: 80
EOF

Get the gateway address.

export INGRESS_GW_ADDRESS=$(kubectl get gateway gateway -n agentgateway-locality -o jsonpath='{.status.addresses[0].value}')
echo $INGRESS_GW_ADDRESS

Send a few baseline requests. Without trafficDistribution set, traffic spreads across all three backends.

for i in $(seq 1 10); do
  curl -s -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
  echo
done

Example output:

backend-zone-b-6bddfdcd85-ht8qn
backend-region-b-5d46cfc8b5-xmfnc
backend-zone-a-868fdff56f-w9jsn
backend-region-b-5d46cfc8b5-xmfnc
backend-region-b-5d46cfc8b5-xmfnc
backend-region-b-5d46cfc8b5-xmfnc
backend-zone-a-868fdff56f-w9jsn
backend-region-b-5d46cfc8b5-xmfnc
backend-region-b-5d46cfc8b5-xmfnc
backend-zone-a-868fdff56f-w9jsn

Enable locality-aware failover by setting trafficDistribution: PreferClose on the Service.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Service
metadata:
  name: locality-svc
  namespace: agentgateway-locality
spec:
  selector:
    app: locality-svc-workloadentry
  ports:
    - name: http
      port: 80
      targetPort: 80
      protocol: TCP
  trafficDistribution: PreferClose
EOF

Send requests again. All requests now go to backend-zone-a, the only backend in the same zone as the gateway.

for i in $(seq 1 20); do
  curl -s -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
  echo
done | sort | uniq -c

Example output:

20 backend-zone-a-868fdff56f-w9jsn

Simulate a same-zone outage by deleting the same-zone WorkloadEntry. Traffic spills over to the next bucket, which is the same region but a different zone.

kubectl delete workloadentry we-zone-a -n agentgateway-locality
sleep 2

for i in $(seq 1 20); do
  curl -s --max-time 5 -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
  echo
done | sort | uniq -c

Example output:

20 backend-zone-b-6bddfdcd85-ht8qn

Delete the same-region WorkloadEntry. Traffic spills over to the cross-region backend.

kubectl delete workloadentry we-zone-b -n agentgateway-locality
sleep 2

for i in $(seq 1 20); do
  curl -s --max-time 5 -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
  echo
done | sort | uniq -c

Example output:

20 backend-region-b-5d46cfc8b5-xmfnc ```

Set up strict same-node routing

Use internalTrafficPolicy: Local to require that requests reach an endpoint on the same node as the gateway. Unlike trafficDistribution, strict locality does not spill over. When no local endpoints exist, requests return 503 Service Unavailable.

Restore the same-zone and same-region WorkloadEntries that you deleted in the previous task.

ZONE_A_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-a -o jsonpath='{.items[0].status.podIP}')
ZONE_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-zone-b -o jsonpath='{.items[0].status.podIP}')
REGION_B_IP=$(kubectl get pod -n agentgateway-locality -l app=backend-region-b -o jsonpath='{.items[0].status.podIP}')

kubectl apply -f- <<EOF
apiVersion: networking.istio.io/v1
kind: WorkloadEntry
metadata:
  name: we-zone-a
  namespace: agentgateway-locality
  labels:
    app: locality-svc-workloadentry
spec:
  address: ${ZONE_A_IP}
  locality: "region/zone"
  ports:
    http: 80
---
apiVersion: networking.istio.io/v1
kind: WorkloadEntry
metadata:
  name: we-zone-b
  namespace: agentgateway-locality
  labels:
    app: locality-svc-workloadentry
spec:
  address: ${ZONE_B_IP}
  locality: "region/other-zone"
  ports:
    http: 80
---
apiVersion: networking.istio.io/v1
kind: WorkloadEntry
metadata:
  name: we-region-b
  namespace: agentgateway-locality
  labels:
    app: locality-svc-workloadentry
spec:
  address: ${REGION_B_IP}
  locality: "other-region/zone"
  ports:
    http: 80
EOF

Switch the Service from trafficDistribution to internalTrafficPolicy: Local. The example uses WorkloadEntries with no node association, so no endpoints are eligible for local-only delivery.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Service
metadata:
  name: locality-svc
  namespace: agentgateway-locality
spec:
  selector:
    app: locality-svc-workloadentry
  ports:
    - name: http
      port: 80
      targetPort: 80
      protocol: TCP
  internalTrafficPolicy: Local
EOF

Send requests and observe that every request returns 503.
```
for i in $(seq 1 10); do
  curl -s -o /dev/null -w "%{http_code}\n" -H "host: locality.test" "http://${INGRESS_GW_ADDRESS}/hostname"
done | sort | uniq -c
```
Example output:
```
  10 503
```
In a multi-node cluster, replace the WorkloadEntries with pod-backed endpoints on the same node as the gateway to see successful responses.

Cleanup

You can remove the resources that you created in this guide.

kubectl delete namespace agentgateway-locality

Next steps

Combine locality-aware routing with traffic splitting to weight traffic across backends within each locality bucket.
For LLM provider routing, see Failover across LLM providers, which uses the same priority-bucket model with a CEL-based health policy.

Was this page helpful?

Locality-aware routing

About

How locality bucketing works

Failover vs. strict locality

How the gateway determines its own locality

Before you begin

Set up failover across localities

Set up strict same-node routing

Cleanup

Next steps

What could be improved?