Helicone
Helicone is an LLM observability platform with built-in caching, rate limiting, and cost tracking.
Features
- Request logging - Log all LLM requests and responses
- Caching - Cache responses to reduce costs
- Rate limiting - Control request rates per user
- Cost tracking - Monitor spending across models
- User analytics - Track usage by user or session
- Prompt templates - Manage and version prompts
Configuration
Helicone works as a proxy. Configure Agent Gateway to route through Helicone:
binds:
- port: 3000
listeners:
- routes:
- policies:
urlRewrite:
authority:
full: oai.helicone.ai
backendTLS: {}
requestHeaderModifier:
add:
Helicone-Auth: "Bearer $HELICONE_API_KEY"
backends:
- ai:
name: openai
hostOverride: oai.helicone.ai:443
provider:
openAI:
model: gpt-4o-mini
policies:
backendAuth:
key: "$OPENAI_API_KEY"Benefits with Agent Gateway
Using Agent Gateway with Helicone provides:
| Feature | Agent Gateway | Helicone | Combined |
|---|---|---|---|
| Request routing | ✅ | ❌ | Route to multiple LLMs via Helicone |
| Caching | ❌ | ✅ | Helicone caches responses |
| Rate limiting | ✅ | ✅ | Layered rate limiting |
| Cost tracking | Basic | ✅ | Detailed cost analytics |
| MCP support | ✅ | ❌ | MCP with LLM monitoring |