Skip to content
✨ agentgateway has joined the Agentic AI Foundation (AAIF) — Learn more

For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.

Page as Markdown

Custom providers

Configure self-hosted and non-managed LLM providers with explicit API formats, paths, and backend targets.

Use custom providers for self-hosted, OpenAI-compatible, or non-managed LLM providers when you want to declare the provider target and supported API formats explicitly.

Custom providers are useful when:

  • The provider supports only a subset of OpenAI APIs, such as chat completions but not responses.
  • The provider supports multiple API shapes, such as OpenAI chat completions and Anthropic messages.
  • The provider uses non-default paths for one or more API formats.
  • You want to use LLM features, such as token counting, rate limiting, guardrails, transformations, and observability, with an AgentgatewayBackend that routes to a Kubernetes Service or InferencePool.

For managed providers such as OpenAI, Anthropic, Gemini, Vertex AI, Azure, and Bedrock, use the managed provider type unless you need explicit format or backend target control.

Supported targets

A custom provider must specify exactly one upstream target.

TargetWhen to use
host and portRoute to a DNS name or external endpoint.
backendRef to a ServiceRoute to a namespace-local Kubernetes Service.
backendRef to an InferencePoolUse Gateway API Inference Extension endpoint selection and agentgateway LLM features together.

The backendRef must be namespace-local and can target only a Service or an InferencePool. Service references require a port. InferencePool references do not.

Supported formats

Set custom.formats to declare the provider-native formats that the upstream provider supports. You can also set formats[].path when the provider uses a non-default path for that format.

FormatDefault upstream path
Completions/v1/chat/completions
Messages/v1/messages
Responses/v1/responses
Embeddings/v1/embeddings
AnthropicTokenCount/v1/messages/count_tokens
Realtime/v1/realtime

Agentgateway chooses from the provider-native formats that you declare. For example, if a custom provider supports OpenAI chat completions but not OpenAI responses, declare only Completions. If the provider exposes multiple API shapes, declare each supported format and optionally set a per-format path.

Client request formatPreferred custom provider format
OpenAI chat completionsCompletions
Anthropic messagesMessages
OpenAI responsesResponses, then Completions
OpenAI embeddingsEmbeddings
Anthropic token countAnthropicTokenCount
OpenAI realtimeRealtime

If no declared provider format can serve the client request format, agentgateway rejects the request.

Route to a host and port

Use host and port when the LLM provider is reachable by DNS name or IP address. The following example declares that the provider supports both OpenAI chat completions and Anthropic messages.

apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: ollama-custom
  namespace: agentgateway-system
spec:
  ai:
    provider:
      custom:
        model: llama3.2
        formats:
        - type: Completions
          path: /v1/chat/completions
        - type: Messages
          path: /v1/messages
      host: ollama.agentgateway-system.svc.cluster.local
      port: 11434

Route to a Service

Use a Service backendRef when the LLM provider runs behind a Kubernetes Service in the same namespace as the AgentgatewayBackend.

apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: local-llm
  namespace: agentgateway-system
spec:
  ai:
    provider:
      custom:
        backendRef:
          name: llm-service
          port: 8080
        model: llama3
        formats:
        - type: Completions

Route to an InferencePool

Use an InferencePool backendRef when you want the Endpoint Picker Extension (EPP) to select a model server, but you also want agentgateway to run the LLM request and response pipeline.

With this flow, the route points to the AgentgatewayBackend, and the custom provider points to the InferencePool.

    graph LR
    Client --> Gateway
    Gateway --> HTTPRoute
    HTTPRoute --> AgentgatewayBackend
    AgentgatewayBackend --> InferencePool
    InferencePool --> ModelServer["model server"]
  
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: qwen-inferencepool
  namespace: agentgateway-system
spec:
  ai:
    provider:
      custom:
        backendRef:
          group: inference.networking.k8s.io
          kind: InferencePool
          name: vllm-qwen25-15b-instruct
        model: Qwen/Qwen2.5-1.5B-Instruct
        formats:
        - type: Completions
          path: /v1/chat/completions
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: qwen
  namespace: agentgateway-system
spec:
  parentRefs:
  - name: agentgateway-proxy
    namespace: agentgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1/chat/completions
    backendRefs:
    - group: agentgateway.dev
      kind: AgentgatewayBackend
      name: qwen-inferencepool
Most users can keep the default llm-d Router OpenAI parser and send OpenAI-compatible requests, such as /v1/chat/completions. If clients send a different request format, configure the llm-d Router EPP parser, such as router.epp.parser, for that client-facing format. For parser options, see the llm-d Router parser docs.

Limitations

  • Custom providers cannot target another AgentgatewayBackend.
  • Custom provider backendRef can target only namespace-local Services and InferencePools.
  • Custom providers do not add arbitrary gRPC provider support.
  • Do not combine provider-level path or pathPrefix with formats[].path. Use one path configuration style per provider.
  • The Detect and Passthrough route modes are not custom provider formats. Use provider routes when you need those modes for a request path.
Was this page helpful?
Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.