Skip to content

For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.

Page as Markdown

Inference routing

Agentgateway supports the Kubernetes Gateway API Inference Extension in two deployment modes.

Kubernetes Gateway API mode

In Kubernetes Gateway API mode, agentgateway runs as the gateway data plane for Gateway API resources. You install the Inference Extension CRDs, create an InferencePool, and route to that pool from an HTTPRoute. The Endpoint Picker Extension (EPP) acts as an extension service that selects the best model server endpoint for each inference request.

Use this mode when you want Gateway API integration, InferencePool resources, traffic splitting, route matching, and other Kubernetes networking features.

Set up Kubernetes inference routing

Standalone request scheduler mode

In standalone request scheduler mode, agentgateway runs as a sidecar proxy with the EPP. The proxy and EPP communicate over localhost, and agentgateway uses its standalone inferenceRouting local configuration to route requests to a synthetic service before consulting the EPP for endpoint selection.

Use this mode for single-tenant or job-scoped workloads where deploying a full Gateway API stack would add unnecessary operational overhead. In this mode, the upstream standalone Helm chart can deploy agentgateway as the sidecar proxy with proxyType: agentgateway.

Standalone request scheduler mode does not support InferencePool. The standalone configuration must define a top-level synthetic service, such as a services entry, and the route backend must reference that service. When EPP owns endpoint discovery, set destinationMode: passthrough so EPP-selected destinations can be forwarded to directly without matching local workload endpoint data.

For example, the standalone agentgateway configuration defines the synthetic service in services, and the route backend references it as default/my-model.

services:
- name: my-model
  namespace: default
  hostname: my-model
  vips: []
  ports:
    8000: 8000

binds:
- port: 8081
  listeners:
  - routes:
    - backends:
      - service:
          name: default/my-model
          port: 8000
        policies:
          inferenceRouting:
            endpointPicker:
              host: 127.0.0.1:9002
            destinationMode: passthrough

For more examples, see the standalone EPP example.

Deploy a standalone request scheduler
Was this page helpful?
Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.