API types

Supported LLM API endpoint types and route configurations

Agentgateway natively supports multiple LLM API endpoint types. These are automatically exposed on the gateway, and translated as appropriate based on the provider.

The following API types have dedicated guides:

Chat completions: The OpenAI /v1/chat/completions endpoint. This is the most widely used API type for text generation and chat applications.
Responses: The OpenAI /v1/responses endpoint for stateful, multi-step model interactions.
Messages: The Anthropic /v1/messages endpoint for Claude models.
Embeddings: The OpenAI-compatible /v1/embeddings endpoint for creating vector representations of text.
Realtime: The OpenAI Realtime API for low-latency, streaming voice and text interactions over WebSockets.
Rerank: The Cohere-compatible /v2/rerank endpoint for ranking documents by relevance to a query.
Models: The OpenAI-compatible /v1/models endpoint for listing available models.
Token count: The Anthropic /v1/messages/count_tokens endpoint for estimating input tokens.
Passthrough: Forwards requests directly to the backend provider without transformation.

Chat completions

Send chat completion requests through agentgateway using the OpenAI Chat Completions API.

Responses

Send requests through agentgateway using the OpenAI Responses API.

Messages

Send requests through agentgateway using the Anthropic Messages API.

Embeddings

Send embedding requests through agentgateway using the OpenAI-compatible Embeddings API.

OpenAI Realtime

Proxy OpenAI Realtime API WebSocket traffic and track token usage.

Rerank

Send rerank requests through agentgateway using the Cohere-compatible Rerank API.

Passthrough

Forward requests to the upstream provider without transformation.

Models

List available models through agentgateway using the OpenAI-compatible Models API.

Token count

Count tokens through agentgateway using the Anthropic Messages token-count API.

About Providers

Was this page helpful?

API types

What could be improved?