For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.
Custom providers
Configure self-hosted and non-managed LLM providers with explicit API formats, paths, and backend targets.
Use custom providers for self-hosted, OpenAI-compatible, or non-managed LLM providers when you want to declare the provider target and supported API formats explicitly.
Custom providers are useful when:
- The provider supports only a subset of OpenAI APIs, such as chat completions but not responses.
- The provider supports multiple API shapes, such as OpenAI chat completions and Anthropic messages.
- The provider uses non-default paths for one or more API formats.
- You want to use LLM features, such as token counting, rate limiting, guardrails, transformations, and observability, with an AgentgatewayBackend that routes to a Kubernetes Service or InferencePool.
For managed providers such as OpenAI, Anthropic, Gemini, Vertex AI, Azure, and Bedrock, use the managed provider type unless you need explicit format or backend target control.
Supported targets
A custom provider must specify exactly one upstream target.
| Target | When to use |
|---|---|
host and port | Route to a DNS name or external endpoint. |
backendRef to a Service | Route to a namespace-local Kubernetes Service. |
backendRef to an InferencePool | Use Gateway API Inference Extension endpoint selection and agentgateway LLM features together. |
The backendRef must be namespace-local and can target only a Service or an
InferencePool. Service references require a port. InferencePool references do
not.
Supported formats
Set custom.formats to declare the provider-native formats that the upstream
provider supports. You can also set formats[].path when the provider uses a
non-default path for that format.
| Format | Default upstream path |
|---|---|
Completions | /v1/chat/completions |
Messages | /v1/messages |
Responses | /v1/responses |
Embeddings | /v1/embeddings |
AnthropicTokenCount | /v1/messages/count_tokens |
Realtime | /v1/realtime |
Agentgateway chooses from the provider-native formats that you declare. For
example, if a custom provider supports OpenAI chat completions but not OpenAI
responses, declare only Completions. If the provider exposes multiple API
shapes, declare each supported format and optionally set a per-format path.
| Client request format | Preferred custom provider format |
|---|---|
| OpenAI chat completions | Completions |
| Anthropic messages | Messages |
| OpenAI responses | Responses, then Completions |
| OpenAI embeddings | Embeddings |
| Anthropic token count | AnthropicTokenCount |
| OpenAI realtime | Realtime |
If no declared provider format can serve the client request format, agentgateway rejects the request.
Route to a host and port
Use host and port when the LLM provider is reachable by DNS name or IP
address. The following example declares that the provider supports both OpenAI
chat completions and Anthropic messages.
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: ollama-custom
namespace: agentgateway-system
spec:
ai:
provider:
custom:
model: llama3.2
formats:
- type: Completions
path: /v1/chat/completions
- type: Messages
path: /v1/messages
host: ollama.agentgateway-system.svc.cluster.local
port: 11434Route to a Service
Use a Service backendRef when the LLM provider runs behind a Kubernetes
Service in the same namespace as the AgentgatewayBackend.
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: local-llm
namespace: agentgateway-system
spec:
ai:
provider:
custom:
backendRef:
name: llm-service
port: 8080
model: llama3
formats:
- type: CompletionsRoute to an InferencePool
Use an InferencePool backendRef when you want the Endpoint Picker Extension
(EPP) to select a model server, but you also want agentgateway to run the LLM
request and response pipeline.
With this flow, the route points to the AgentgatewayBackend, and the custom provider points to the InferencePool.
graph LR
Client --> Gateway
Gateway --> HTTPRoute
HTTPRoute --> AgentgatewayBackend
AgentgatewayBackend --> InferencePool
InferencePool --> ModelServer["model server"]
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: qwen-inferencepool
namespace: agentgateway-system
spec:
ai:
provider:
custom:
backendRef:
group: inference.networking.k8s.io
kind: InferencePool
name: vllm-qwen25-15b-instruct
model: Qwen/Qwen2.5-1.5B-Instruct
formats:
- type: Completions
path: /v1/chat/completions
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: qwen
namespace: agentgateway-system
spec:
parentRefs:
- name: agentgateway-proxy
namespace: agentgateway-system
rules:
- matches:
- path:
type: PathPrefix
value: /v1/chat/completions
backendRefs:
- group: agentgateway.dev
kind: AgentgatewayBackend
name: qwen-inferencepool/v1/chat/completions. If clients send a
different request format, configure the llm-d Router EPP parser, such as
router.epp.parser, for that client-facing format. For parser options, see the
llm-d Router parser docs.Limitations
- Custom providers cannot target another AgentgatewayBackend.
- Custom provider
backendRefcan target only namespace-local Services and InferencePools. - Custom providers do not add arbitrary gRPC provider support.
- Do not combine provider-level
pathorpathPrefixwithformats[].path. Use one path configuration style per provider. - The
DetectandPassthroughroute modes are not custom provider formats. Use provider routes when you need those modes for a request path.