Rate limiting for HTTP
Apply local and global rate limits to HTTP traffic to protect your backend services from overload.
About
Rate limiting in agentgateway protects your services from being overwhelmed by excessive traffic. A runaway automation script, a misconfigured retry loop, or a deliberate flood can exhaust your upstream’s capacity in seconds. Rate limiting gives you precise control over how much traffic reaches any route or the entire gateway — without any changes to the backend.
Rate limiting in agentgateway is expressed through AgentgatewayPolicy resources. A policy attaches to a Gateway or HTTPRoute target, and defines limits in the spec.traffic.rateLimit field. Gateway-level policies act as a hard ceiling on total traffic, while route-level policies provide finer-grained control.
Additionally, you can set up local or global rate limiting, depending on whether you want limits shared across Gateway instances.
| Mode | Where limits are enforced | Use case |
|---|---|---|
| Local | In-process, per proxy replica | Simple per-route or gateway-wide limits |
| Global | External rate limit service | Shared limits across multiple proxy replicas |
For AI-specific use cases, see:
Gateway-level global DoS protection
Target your Gateway resource to apply a limit across all routes. This acts as a hard ceiling on total gateway throughput regardless of which route is hit.
Example gateway policy
kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
name: gateway-rate-limit
namespace: agentgateway-system
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: agentgateway-proxy
traffic:
rateLimit:
local:
- requests: 5000
unit: Minutes
burst: 1000
EOFRoute-level rate limit
Route-level policies take precedence over gateway-level ones for their specific traffic.
Example route policy
kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
name: httpbin-rate-limit
namespace: httpbin
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: httpbin
traffic:
rateLimit:
local:
- requests: 3
unit: Seconds
burst: 3
EOFInheritance
Policies apply at the attachment point with a clear precedence order:
Gateway → Listener → Route → Route Rule → BackendMore specific policies win. A route-level limit overrides a gateway-level limit for traffic on that route.
With both policies in place, traffic to www.example.com is subject to the route limit (3 req/s), while all other routes are bounded only by the gateway limit (5000 req/min).
Response headers
When rate limiting is enabled, the following headers are added to responses. These headers help clients understand their current rate limit status and adapt their behavior accordingly.
Note: The x-envoy-ratelimited header is only present when using global rate limiting with an Envoy-compatible rate limit service. It is added by the rate limit service itself, not by agentgateway. As such, this header does not appear with local rate limiting.
| Header | Description | Added by | Example |
|---|---|---|---|
x-ratelimit-limit | The rate limit ceiling for the given request. For local rate limiting, this is the base limit plus burst. For global rate limiting with time windows, this might include window information. | Agentgateway | 6 (local), 10, 10;w=60 (global with 60-second window) |
x-ratelimit-remaining | The number of requests (or tokens for LLM rate limiting) remaining in the current time window. | Agentgateway | 5 |
x-ratelimit-reset | The time in seconds until the rate limit window resets. | Agentgateway | 30 |
x-envoy-ratelimited | Present when the request is rate limited. Only appears in 429 responses when using global rate limiting. | External rate limit service | (header present) |
Before you begin
Important: Install the experimental channel of the Kubernetes Gateway API to use this feature.
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/experimental-install.yaml --server-sideUpgrade or install agentgateway with the
KGW_ENABLE_GATEWAY_API_EXPERIMENTAL_FEATURESenvironment variable. This setting defaults tofalseand must be explicitly enabled to use Gateway API experimental features.Example command:
helm upgrade -i agentgateway oci://ghcr.io/kgateway-dev/charts/agentgateway \ --namespace agentgateway-system \ --version v2.2.1 \ --set controller.image.pullPolicy=Always \ --set controller.extraEnv.KGW_ENABLE_GATEWAY_API_EXPERIMENTAL_FEATURES=trueFollow the Sample app guide to deploy the httpbin sample app
Get the external address of the gateway and save it in an environment variable.
export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system agentgateway-proxy -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}") echo $INGRESS_GW_ADDRESSkubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 8080:80
Local rate limiting
Local rate limiting runs entirely inside the agentgateway proxy — no external service needed. The following steps show how to apply request-based limits to your HTTP traffic.
Apply a rate limit to the httpbin HTTPRoute.
Review the following table to understand this configuration.kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayPolicy metadata: name: httpbin-rate-limit namespace: httpbin spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: httpbin traffic: rateLimit: local: - requests: 3 unit: Seconds burst: 3 EOFField Required Description requestsYes Number of requests allowed per unit.unitYes Seconds,Minutes, orHours.burstNo Extra requests allowed above the base rate in a short burst. The burstfield implements a token bucket on top of the base rate. Withrequests: 3, burst: 3, you get up to 6 requests in one burst (3 base + 3 burst capacity), then the bucket refills at 3 per second. This absorbs short traffic spikes without rejecting requests. This setting only works withrequests, not withtokenrate limits.Verify that the policy is attached.
kubectl get AgentgatewayPolicy httpbin-rate-limit -n httpbin \ -o jsonpath='{.status.ancestors[0].conditions}' | jq .A healthy policy reports both
AcceptedandAttachedasTrue:[ { "type": "Accepted", "status": "True", "message": "Policy accepted" }, { "type": "Attached", "status": "True", "message": "Attached to all targets" } ]If
AttachedisFalse, the policy’stargetRefpoints to a resource that doesn’t exist. Check themessagefield for the exact resource name that’s missing.Fire 10 rapid requests to test the rate limit.
for i in $(seq 1 10); do STATUS=$(curl -s -o /dev/null -w "%{http_code}" \ http://$INGRESS_GW_ADDRESS:80/headers -H "host: www.example.com") echo "Request $i: HTTP $STATUS" donefor i in $(seq 1 10); do STATUS=$(curl -s -o /dev/null -w "%{http_code}" \ localhost:8080/headers -H "host: www.example.com") echo "Request $i: HTTP $STATUS" doneExample output:
Request 1: HTTP 200 Request 2: HTTP 200 Request 3: HTTP 200 Request 4: HTTP 200 Request 5: HTTP 200 Request 6: HTTP 200 Request 7: HTTP 429 Request 8: HTTP 429 Request 9: HTTP 429 Request 10: HTTP 429The first 6 succeed (3 base + 3 burst), then requests are rejected until the bucket refills. Inspect a 429 response to see the rate limit headers:
HTTP/1.1 429 Too Many Requests x-ratelimit-limit: 6 x-ratelimit-remaining: 0 x-ratelimit-reset: 0 content-type: text/plain content-length: 19 rate limit exceededAfter 1 second the bucket refills and requests succeed again.
sleep 1 && curl -o /dev/null -w "%{http_code}\n" \ localhost:8080/headers -H "host: www.example.com" # 200
Global rate limiting
Local rate limiting runs independently on each proxy replica. If you run multiple agentgateway replicas and need a shared quota across the fleet, use global rate limiting backed by an external service such as Envoy’s rate limit service.
For detailed instructions on setting up global rate limiting with descriptors and an external rate limit service, see the Global rate limiting guide.
Cleanup
You can remove the resources that you created in this guide.kubectl delete AgentgatewayPolicy httpbin-rate-limit -n httpbin