LLM consumption

Consume services from LLM providers.

About

Providers

Model aliasing

API keys

Manage API keys for LLM provider authentication.

Virtual keys

Issue API keys with per-key token budgets and cost tracking (also known as virtual keys).

Load balancing

Distribute requests across multiple LLM providers automatically (Power of Two Choices, P2C).

Model failover

Priority-based failover across LLM providers (automatic fallback when models fail or are …

Content-based routing

Route requests to different LLM backends based on request body content, such as the requested model …

Streaming

Stream responses from the LLM to the end user through agentgateway.

OpenAI Realtime

Proxy OpenAI Realtime API WebSocket traffic and track token usage.

Function calling

Guardrails

Protect LLM interactions with prompt guards that evaluate and filter requests and responses for …

Prompt enrichment

Prompt templates

Use static and dynamic prompt templates to customize LLM requests.

Request transformations

Dynamically compute and set LLM request fields using CEL expressions.

Budget and spend limits

Control LLM spending by enforcing token budget limits per API key or user.

Rate limiting for LLMs

Control LLM costs with token-based rate limiting and request-based limits.

LLM cost tracking

Track and monitor LLM costs per request using token usage metrics.

CEL-based RBAC

Metrics and logs

Was this page helpful?