Track LLM costs
Track and monitor LLM costs per request using token usage metrics.
About
Cost tracking (also known as spend monitoring or usage tracking) helps you monitor and control expenses from LLM API calls. Agentgateway automatically tracks token consumption for every request and response, allowing you to calculate costs based on your provider’s pricing model.
Token metrics are exposed through Prometheus metrics and OpenTelemetry traces. You can use these metrics to:
- Calculate costs per request, per user, or per model
- Set up budget alerts and spending limits
- Analyze usage patterns to optimize costs
- Generate cost reports for chargeback or showback
This guide shows you how to access token usage data and calculate costs from that data.
Before you begin
Set up observability
To set up advanced observability with Grafana, Prometheus, and OpenTelemetry for cost dashboards and alerts, see the OTel stack guide.
View token usage metrics
Agentgateway exposes the agentgateway_gen_ai_client_token_usage Prometheus metric that tracks input and output tokens for each request.
Port-forward the agentgateway proxy on port 15020.
kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 15020 & sleep 2Open the agentgateway metrics endpoint.
Look for the
agentgateway_gen_ai_client_token_usagemetric. This histogram includes labels that identify each request:gen_ai_token_type: Whether the metric measuresinputoroutputtokensgen_ai_operation_name: The operation type, such aschatorcompletiongen_ai_system: The LLM provider, such asopenaioranthropicgen_ai_request_model: The model used for the requestgen_ai_response_model: The model that responded (may differ from requested model)
Example metric output:
agentgateway_gen_ai_client_token_usage_bucket{gen_ai_operation_name="chat",gen_ai_request_model="gpt-4",gen_ai_response_model="gpt-4-0613",gen_ai_system="openai",gen_ai_token_type="input",le="100"} 5 agentgateway_gen_ai_client_token_usage_sum{gen_ai_operation_name="chat",gen_ai_request_model="gpt-4",gen_ai_response_model="gpt-4-0613",gen_ai_system="openai",gen_ai_token_type="input"} 342 agentgateway_gen_ai_client_token_usage_count{gen_ai_operation_name="chat",gen_ai_request_model="gpt-4",gen_ai_response_model="gpt-4-0613",gen_ai_system="openai",gen_ai_token_type="input"} 5The
_sumsuffix shows total tokens consumed, and_countshows the number of requests.
For more information about the token usage metric, see the Semantic conventions for generative AI metrics in the OpenTelemetry documentation.
Calculate costs from token metrics
To calculate costs, multiply token counts by your provider’s pricing. Most LLM providers charge separately for input tokens and output tokens.
Common pricing models
Check your LLM provider’s pricing page for current rates. Most providers charge separately for input tokens and output tokens, with prices varying by model and usage. Refer to the LLM provider docs.
Cost calculation formula
Use this formula to calculate the cost per request:
cost = (input_tokens / 1,000,000 × input_price) + (output_tokens / 1,000,000 × output_price)Example calculation for a GPT-4 request with 500 input tokens and 1,000 output tokens:
cost = (500 / 1,000,000 × $30.00) + (1,000 / 1,000,000 × $60.00)
= $0.015 + $0.060
= $0.075Query costs with PromQL
If you have Prometheus set up, you can query aggregate costs using PromQL. This example calculates total cost for GPT-4 requests over the last hour:
# Total input cost for GPT-4 (assuming $30 per 1M input tokens)
(sum(rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_request_model="gpt-4",gen_ai_token_type="input"}[1h])) / 1000000) * 30
# Total output cost for GPT-4 (assuming $60 per 1M output tokens)
(sum(rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_request_model="gpt-4",gen_ai_token_type="output"}[1h])) / 1000000) * 60
# Combined total cost per hour
((sum(rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_request_model="gpt-4",gen_ai_token_type="input"}[1h])) / 1000000) * 30) + ((sum(rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_request_model="gpt-4",gen_ai_token_type="output"}[1h])) / 1000000) * 60)Track costs per user
To track costs per user, combine token metrics with user identification from API keys or JWT claims. For a complete example that integrates API keys, token budgets, and cost tracking, see the virtual key management guide.
Set up API key authentication to identify users. See the API key management guide for details.
Query metrics filtered by user ID. The
X-User-IDheader value is available in Prometheus labels when you configure rate limiting with user identification.Example PromQL query for per-user costs:
# Cost per user over the last 24 hours sum by (user_id) ( ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) / 1000000) * 30) + ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h]) / 1000000) * 60) )
Set up cost alerts
Use Prometheus AlertManager to trigger alerts when costs exceed thresholds.
Example alert rule for daily spending over $100:
groups:
- name: llm_cost_alerts
rules:
- alert: HighDailyCost
expr: |
(
(sum(rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) * 86400) / 1000000 * 30) +
(sum(rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h]) * 86400) / 1000000 * 60)
) > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Daily LLM costs exceed $100"
description: "Estimated daily cost is {{ $value | humanize }}. Review usage patterns."View costs in OpenTelemetry traces
OpenTelemetry traces include token usage as span attributes. You can view per-request token counts in your tracing backend (such as Grafana Tempo, Jaeger, or Langfuse).
Set up OpenTelemetry tracing. See the tracing guide for setup instructions.
Search for traces with LLM requests. Each trace includes these attributes:
gen_ai.usage.input_tokens: Number of input tokensgen_ai.usage.output_tokens: Number of output tokensgen_ai.request.model: Model usedgen_ai.response.model: Model that responded
Calculate costs using the same formula as above, using the token counts from trace attributes.
Enforce spending limits
To enforce per-user spending limits, combine cost tracking with rate limiting:
Set up token-based rate limiting with
type: TOKENkeyed byX-User-ID. See the budget and spend limits guide.Configure the daily token limit based on your budget. For example, a $10 daily budget for GPT-4 allows approximately 166,000 input tokens or 166,000 output tokens (assuming mixed usage).
Monitor actual spending with the metrics queries shown above to ensure rate limits align with budget goals.
For more information, see the budget and spend limits guide.
What’s next
- Set up budget and spend limits to enforce per-user token budgets
- Set up API key authentication to track costs per user
- View metrics and logs for general observability
- Set up tracing for detailed request analysis