View metrics and logs

Review LLM-specific metrics and logs.

To calculate costs from token usage metrics, see the cost tracking guide.

For external logging platforms (also known as prompt logging, request/response logging, or audit trail) like Langfuse and LangSmith, see the LLM Observability integrations.

Before you begin

Complete an LLM guide, such as an LLM provider-specific guide. This guide sends a request to the LLM and receives a response. You can use this request and response example to verify metrics and logs.

View LLM metrics

You can access the agentgateway metrics endpoint to view LLM-specific metrics, such as the number of tokens that you used during a request or response.

Port-forward the agentgateway proxy on port 15020.

kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 15020

Open the agentgateway metrics endpoint.
Look for the agentgateway_gen_ai_client_token_usage metric. This metric is a histogram and includes important information about the request and the response from the LLM, such as:
- gen_ai_token_type: Whether this metric is about a request (input) or response (output).
- gen_ai_operation_name: The name of the operation that was performed.
- gen_ai_system: The LLM provider that was used for the request/response.
- gen_ai_request_model: The model that was used for the request.
- gen_ai_response_model: The model that was used for the response.

For more information, see the Semantic conventions for generative AI metrics in the OpenTelemetry docs.

View realized costs

When you configure a model cost catalog, agentgateway computes the realized USD cost of each LLM request and exposes it across the observability surface:

Logs: each LLM request log line includes agw.ai.usage.cost.total. Add the cost breakdown or applied rates with CEL llm.cost and llm.costRates fields.
Metrics: the agentgateway_cost_catalog_lookups_total counter tracks lookups by status (Exact, Unpriced, Missing, or NoCatalog) and by provider and model, so you can confirm that your catalog prices your traffic.
Traces: cost attributes are attached to the request span.

For catalog configuration and the full list of cost fields, see Model costs.

Track per-user metrics

When you set up API key authentication with per-user rate limiting, you can filter token usage metrics by user ID to track spending and usage patterns for each virtual key.

For a complete virtual key setup guide, see Virtual key management.

Example PromQL query for per-user token usage:

# Total tokens consumed by each user
sum by (user_id) (
  agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"} +
  agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}
)

View logs

Agentgateway automatically logs information to stdout. When you run agentgateway on your local machine, you can view a log entry for each request that is sent to agentgateway in your CLI output.

To view the logs:

kubectl logs deployment/agentgateway-proxy -n agentgateway-system

Example for a successful request to the OpenAI LLM:

2025-12-12T21:56:02.809082Z	info	request gateway=agentgateway-system/agentgateway-proxy listener=http
route=agentgateway-system/openai endpoint=api.openai.com:443 src.addr=127.0.0.1:60862 http.method=POST
http.host=localhost http.path=/openai http.version=HTTP/1.1 http.status=200 protocol=llm gen_ai.
operation.name=chat gen_ai.provider.name=openai gen_ai.request.model=gpt-3.5-turbo gen_ai.response.
model=gpt-3.5-turbo-0125 gen_ai.usage.input_tokens=68 gen_ai.usage.output_tokens=298 duration=2488ms

Rate limiting for LLMs Content safety and PII protection

Was this page helpful?