Helicone

Helicone is an LLM observability platform with built-in caching, rate limiting, and cost tracking.

Features

  • Request logging - Log all LLM requests and responses
  • Caching - Cache responses to reduce costs
  • Rate limiting - Control request rates per user
  • Cost tracking - Monitor spending across models
  • User analytics - Track usage by user or session
  • Prompt templates - Manage and version prompts

Configuration

Helicone works as a proxy. Configure Agent Gateway to route through Helicone:

binds:
- port: 3000
  listeners:
  - routes:
    - policies:
        urlRewrite:
          authority:
            full: oai.helicone.ai
        backendTLS: {}
        requestHeaderModifier:
          add:
            Helicone-Auth: "Bearer $HELICONE_API_KEY"
      backends:
      - ai:
          name: openai
          hostOverride: oai.helicone.ai:443
          provider:
            openAI:
              model: gpt-4o-mini
      policies:
        backendAuth:
          key: "$OPENAI_API_KEY"

Benefits with Agent Gateway

Using Agent Gateway with Helicone provides:

Feature Agent Gateway Helicone Combined
Request routing Route to multiple LLMs via Helicone
Caching Helicone caches responses
Rate limiting Layered rate limiting
Cost tracking Basic Detailed cost analytics
MCP support MCP with LLM monitoring

Learn more