Local rate limiting

Verified

Apply local and global rate limits to HTTP traffic to protect your backend services from overload.

About

Rate limiting in agentgateway protects your services from being overwhelmed by excessive traffic. A runaway automation script, a misconfigured retry loop, or a deliberate flood can exhaust your upstream’s capacity in seconds. Rate limiting gives you precise control over how much traffic reaches any route or the entire gateway — without any changes to the backend.

Rate limiting in agentgateway is expressed through AgentgatewayPolicy resources. A policy attaches to a Gateway or HTTPRoute target, and defines limits in the spec.traffic.rateLimit field. Gateway-level policies act as a hard ceiling on total traffic, while route-level policies provide finer-grained control.

Additionally, you can set up local or global rate limiting, depending on whether you want limits shared across Gateway instances.

Mode	Where limits are enforced	Use case
Local	In-process, per proxy replica	Simple per-route or gateway-wide limits
Global	External rate limit service	Shared limits across multiple proxy replicas

For AI-specific use cases, see:

Gateway-level global DoS protection

Target your Gateway resource to apply a limit across all routes. This acts as a hard ceiling on total gateway throughput regardless of which route is hit.

Example gateway policy

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: gateway-rate-limit
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: agentgateway-proxy
  traffic:
    rateLimit:
      local:
      - requests: 5000
        unit: Minutes
        burst: 1000
EOF

Route-level rate limit

Route-level policies take precedence over gateway-level ones for their specific traffic.

Example route policy

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: httpbin-rate-limit
  namespace: httpbin
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: httpbin
  traffic:
    rateLimit:
      local:
      - requests: 3
        unit: Seconds
        burst: 3
EOF

Inheritance

Policies apply at the attachment point with a clear precedence order:

Gateway → Listener → Route → Route Rule → Backend

More specific policies win. A route-level limit overrides a gateway-level limit for traffic on that route.

With both policies in place, traffic to www.example.com is subject to the route limit (3 req/s), while all other routes are bounded only by the gateway limit (5000 req/min).

Response headers

When rate limiting is enabled, the following headers are added to responses. These headers help clients understand their current rate limit status and adapt their behavior accordingly.

Note: The x-envoy-ratelimited header is only present when using global rate limiting with an Envoy-compatible rate limit service. It is added by the rate limit service itself, not by agentgateway. As such, this header does not appear with local rate limiting.

Header	Description	Added by	Example
`x-ratelimit-limit`	The rate limit ceiling for the given request. For local rate limiting, this is the base limit plus burst. For global rate limiting with time windows, this might include window information.	Agentgateway	`6` (local), `10, 10;w=60` (global with 60-second window)
`x-ratelimit-remaining`	The number of requests (or tokens for LLM rate limiting) remaining in the current time window.	Agentgateway	`5`
`x-ratelimit-reset`	The time in seconds until the rate limit window resets.	Agentgateway	`30`
`x-envoy-ratelimited`	Present when the request is rate limited. Only appears in 429 responses when using global rate limiting.	External rate limit service	(header present)

Before you begin

Important: Install the experimental channel of the Kubernetes Gateway API to use this feature.

kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.5.0/experimental-install.yaml

Upgrade or install agentgateway with the KGW_ENABLE_GATEWAY_API_EXPERIMENTAL_FEATURES environment variable. This setting defaults to false and must be explicitly enabled to use Gateway API experimental features.

Example command:

helm upgrade -i agentgateway oci://cr.agentgateway.dev/charts/agentgateway  \
  --namespace agentgateway-system \
  --version v1.1.0 \
  --set controller.image.pullPolicy=Always \
  --set controller.extraEnv.KGW_ENABLE_GATEWAY_API_EXPERIMENTAL_FEATURES=true

Set up an agentgateway proxy.
Follow the Sample app guide to deploy the httpbin sample app

Get the external address of the gateway and save it in an environment variable.

export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system agentgateway-proxy -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
echo $INGRESS_GW_ADDRESS

kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 8080:80

Local rate limiting

Local rate limiting runs entirely inside the agentgateway proxy — no external service needed. The following steps show how to apply request-based limits to your HTTP traffic.

Apply a rate limit to the httpbin HTTPRoute.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: httpbin-rate-limit
  namespace: httpbin
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: httpbin
  traffic:
    rateLimit:
      local:
      - requests: 3
        unit: Seconds
        burst: 3
EOF

Review the following table to understand this configuration.

Field	Required	Description
`requests`	Yes	Number of requests allowed per `unit`.
`unit`	Yes	`Seconds`, `Minutes`, or `Hours`.
`burst`	No	Extra requests allowed above the base rate in a short burst. The `burst` field implements a token bucket on top of the base rate. With `requests: 3, burst: 3`, you get up to 6 requests in one burst (3 base + 3 burst capacity), then the bucket refills at 3 per second. This absorbs short traffic spikes without rejecting requests. This setting only works with `requests`, not with `token` rate limits.

Verify that the policy is attached.

kubectl get AgentgatewayPolicy httpbin-rate-limit -n httpbin \
  -o jsonpath='{.status.ancestors[0].conditions}' | jq .

A healthy policy reports both Accepted and Attached as True:

[
  {
    "type": "Accepted",
    "status": "True",
    "message": "Policy accepted"
  },
  {
    "type": "Attached",
    "status": "True",
    "message": "Attached to all targets"
  }
]

If Attached is False, the policy’s targetRef points to a resource that doesn’t exist. Check the message field for the exact resource name that’s missing.

Fire 10 rapid requests to test the rate limit.

for i in $(seq 1 10); do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    http://$INGRESS_GW_ADDRESS:80/headers -H "host: www.example.com")
  echo "Request $i: HTTP $STATUS"
done

for i in $(seq 1 10); do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    localhost:8080/headers -H "host: www.example.com")
  echo "Request $i: HTTP $STATUS"
done

Example output:

Request 1: HTTP 200
Request 2: HTTP 200
Request 3: HTTP 200
Request 4: HTTP 200
Request 5: HTTP 200
Request 6: HTTP 200
Request 7: HTTP 429
Request 8: HTTP 429
Request 9: HTTP 429
Request 10: HTTP 429

The first 6 succeed (3 base + 3 burst), then requests are rejected until the bucket refills. Inspect a 429 response to see the rate limit headers:

HTTP/1.1 429 Too Many Requests
x-ratelimit-limit: 6
x-ratelimit-remaining: 0
x-ratelimit-reset: 0
content-type: text/plain
content-length: 19

rate limit exceeded

After 1 second the bucket refills and requests succeed again.

sleep 1 && curl -o /dev/null -w "%{http_code}\n" \
  localhost:8080/headers -H "host: www.example.com"
# 200

Global rate limiting

Local rate limiting runs independently on each proxy replica. If you run multiple agentgateway replicas and need a shared quota across the fleet, use global rate limiting backed by an external service such as Envoy’s rate limit service.

For detailed instructions on setting up global rate limiting with descriptors and an external rate limit service, see the Global rate limiting guide.

Cleanup

You can remove the resources that you created in this guide.

kubectl delete AgentgatewayPolicy httpbin-rate-limit -n httpbin

CSRF Global rate limiting

Was this page helpful?

Local rate limiting

About

Gateway-level global DoS protection

Route-level rate limit

Inheritance

Response headers

Before you begin

Local rate limiting

Global rate limiting

Cleanup

What could be improved?