LLM consumption
Consume services from LLM providers.
About
Providers
Model aliasing
API keys
Manage API keys for LLM provider authentication.
Virtual keys
Issue API keys with per-key token budgets and cost tracking (also known as virtual keys).
Load balancing
Distribute requests across multiple LLM providers automatically (Power of Two Choices, P2C).
Model failover
Priority-based failover across LLM providers (automatic fallback when models fail or are …
Content-based routing
Route requests to different LLM backends based on request body content, such as the requested model …
Streaming
Stream responses from the LLM to the end user through agentgateway.
OpenAI Realtime
Proxy OpenAI Realtime API WebSocket traffic and track token usage.
Function calling
Guardrails
Protect LLM interactions with prompt guards that evaluate and filter requests and responses for …
Prompt enrichment
Prompt templates
Use static and dynamic prompt templates to customize LLM requests.
Request transformations
Dynamically compute and set LLM request fields using CEL expressions.
Budget and spend limits
Control LLM spending by enforcing token budget limits per API key or user.
Rate limiting for LLMs
Control LLM costs with token-based rate limiting and request-based limits.
LLM cost tracking
Track and monitor LLM costs per request using token usage metrics.
CEL-based RBAC
Metrics and logs