For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.
Inference routing
Agentgateway supports the Kubernetes Gateway API Inference Extension in two deployment modes.
Kubernetes Gateway API mode
In Kubernetes Gateway API mode, agentgateway runs as the gateway data plane for
Gateway API resources. You install the Inference Extension CRDs, create an
InferencePool, and route to that pool from an HTTPRoute. The Endpoint Picker
Extension (EPP) acts as an extension service that selects the best model server
endpoint for each inference request.
Use this mode when you want Gateway API integration, InferencePool resources,
traffic splitting, route matching, and other Kubernetes networking features.
Standalone request scheduler mode
In standalone request scheduler mode, agentgateway runs as a sidecar proxy with
the EPP. The proxy and EPP communicate over localhost, and agentgateway uses its
standalone inferenceRouting local configuration to route requests to a
synthetic service before consulting the EPP for endpoint selection.
Use this mode for single-tenant or job-scoped workloads where deploying a full
Gateway API stack would add unnecessary operational overhead. In this mode, the
upstream standalone Helm chart can deploy agentgateway as the sidecar proxy with
proxyType: agentgateway.
Standalone request scheduler mode does not support InferencePool. The
standalone configuration must define a top-level synthetic service, such as a
services entry, and the route backend must reference that service. When EPP
owns endpoint discovery, set destinationMode: passthrough so EPP-selected
destinations can be forwarded to directly without matching local workload
endpoint data.
For example, the standalone agentgateway configuration defines the synthetic
service in services, and the route backend references it as
default/my-model.
services:
- name: my-model
namespace: default
hostname: my-model
vips: []
ports:
8000: 8000
binds:
- port: 8081
listeners:
- routes:
- backends:
- service:
name: default/my-model
port: 8000
policies:
inferenceRouting:
endpointPicker:
host: 127.0.0.1:9002
destinationMode: passthroughFor more examples, see the standalone EPP example.