For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.
Backend health
Automatically evict and restore unhealthy backend endpoints with passive health checking.
Agentgateway continuously tracks the health of the endpoints behind a backend and can automatically remove, or evict, endpoints that return errors, then gradually restore them as they recover. This passive health checking (also known as outlier detection) is built into the load balancer, so it applies to any backend, including regular Kubernetes Services, not just LLM providers.
Unlike active health checks that probe endpoints on a schedule, passive health checking observes the responses from real traffic. When an endpoint’s responses match an unhealthy condition that you define, its health score drops. If the score crosses the eviction threshold, the gateway stops sending new requests to that endpoint for a backoff period, then returns it to the pool to see whether it recovered.
Before you begin
- Set up an agentgateway proxy.
- Install the httpbin sample app.
How backend health checking works
You configure backend health checking in the health field of a backend policy. The health field has two parts:
unhealthyCondition: A CEL expression that is evaluated against each response. When the expression returnstrue, the response is counted as unhealthy. If you do not set this field, any5xxresponse or connection failure is treated as unhealthy, which lowers the endpoint’s health score but does not trigger eviction on its own.eviction: The settings that control when an unhealthy endpoint is evicted and how it recovers, such as how many consecutive failures to allow before eviction (consecutiveFailures), how long to evict the endpoint for (duration), and the health score to restore it with (restoreHealth).
When every endpoint of a backend is evicted, the load balancer falls back to returning evicted endpoints rather than failing requests entirely. As such, you typically observe eviction in action only when a backend has multiple endpoints and some of them are healthy.
Configure backend health checking
The following example evicts an httpbin endpoint after it returns three consecutive 5xx responses, keeps it out of the pool for 30 seconds, and then restores it with full health. Restoring full health does not guarantee that the endpoint has recovered. If it keeps failing, it is evicted again, but each subsequent eviction lasts longer because the duration uses a multiplicative backoff. This backoff prevents a tight evict-restore-fail loop from sending a steady stream of traffic to a persistently broken endpoint. To restore the endpoint more cautiously, set restoreHealth below 100 so that it returns with a degraded health score and receives less traffic until it proves healthy.
Create an AgentgatewayPolicy that applies backend health settings to the httpbin Service. Because the policy targets a Service, create it in the same namespace as the Service.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayPolicy metadata: name: httpbin-health namespace: httpbin spec: targetRefs: - group: "" kind: Service name: httpbin backend: health: unhealthyCondition: 'response.code >= 500' eviction: consecutiveFailures: 3 duration: 30s restoreHealth: 100 EOFSetting Description targetRefsThe backend to apply the health settings to. This example targets the httpbin Kubernetes Service ( group: "",kind: Service). You can also target an AgentgatewayBackend.backend.health.unhealthyConditionA CEL expression that is evaluated against each response. When it returns true, the response counts as unhealthy. This example treats any5xxresponse as unhealthy.backend.health.eviction.consecutiveFailuresThe number of consecutive unhealthy responses required before the endpoint is evicted. backend.health.eviction.durationThe base amount of time to evict the endpoint for. Subsequent evictions use a multiplicative backoff. backend.health.eviction.restoreHealthThe health score from 0 to 100 to assign the endpoint when it returns from eviction. Set a value below 100 for gradual recovery, or 100 to restore it immediately. Port-forward the gateway proxy on port 15000.
kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 15000Get the config dump and verify that the health policy is applied to the httpbin Service.
Example
jqcommand:curl -s http://localhost:15000/config_dump | jq '[.policies[] | select(.name.name == "httpbin-health")] | .[0]'Example output: Note that the gateway reports your
unhealthyConditionasunhealthyExpression, and normalizes therestoreHealthvalue of100to its internal1(100%).http://localhost:15000/config_dump{ "key": "httpbin/httpbin-health:health:httpbin/httpbin.httpbin.svc.cluster.local", "name": { "kind": "AgentgatewayPolicy", "name": "httpbin-health", "namespace": "httpbin" }, "target": { "backend": { "service": { "hostname": "httpbin.httpbin.svc.cluster.local", "namespace": "httpbin" } } }, "policy": { "backend": { "health": { "unhealthyExpression": "response.code >= 500", "eviction": { "duration": "30s", "restoreHealth": 1, "consecutiveFailures": 3 } } } } }Send requests to the httpbin app to confirm that healthy traffic still flows. The
/headersendpoint returns a200response code, and the/status/503endpoint simulates an unhealthy backend response that matches yourunhealthyCondition.curl -i "http://${INGRESS_GW_ADDRESS}:80/headers" -H "host: www.example.com" curl -i "http://${INGRESS_GW_ADDRESS}:80/status/503" -H "host: www.example.com"The
/headersrequest returns a200response, and the/status/503request returns a503. With a single httpbin endpoint, the gateway falls back to the evicted endpoint instead of failing requests. To observe eviction shifting traffic away from an unhealthy endpoint, scale the backend to multiple endpoints.
Cleanup
You can remove the resources that you created in this guide.kubectl delete AgentgatewayPolicy httpbin-health -n httpbin