Model costs

Price LLM requests with a model cost catalog and expose realized USD costs in logs, traces, metrics, and CEL policies.

Agentgateway can compute the realized USD cost of each LLM request when you provide a model cost catalog. With a catalog in place, agentgateway attributes cost per request in access logs, traces, and metrics, and exposes the values to CEL expressions as llm.cost and llm.costRates.

Agentgateway does not ship a built-in catalog. Costs are computed only when you configure one (for example, a catalog that you generate with agctl costs import).

Before you begin

Install the agentgateway binary.

Step 1: Prepare a catalog

Prepare a catalog by creating your own JSON file or using the agctl costs import command.

Catalog JSON format

A model cost catalog is JSON with the following high-level structure. Field names are camelCase, and unknown fields are rejected.

{
  "providers": {
    "<provider-id>": {
      "models": {
        "<model-name>": {
          "rates": {
            "input": "0.0",
            "output": "0.0",
            "cacheRead": "0.0",
            "cacheWrite": "0.0",
            "reasoning": "0.0",
            "inputAudio": "0.0",
            "outputAudio": "0.0"
          },
          "tiers": [
            {
              "contextOver": 200000,
              "rates": {
                "input": "0.0",
                "output": "0.0"
              }
            }
          ]
        }
      }
    }
  }
}

Key points:

Lookups are by provider id (such as openai, anthropic, or gcp.gemini) and model name (such as gpt-4o-mini).
Rates are strings (exact decimals), in USD per 1,000,000 tokens.
If a rate is omitted, that token type is not priced for the model.
tiers[] is optional. Each tier selects alternate rates when the request context length is over the tier’s contextOver value. Tiers must be ordered by strictly increasing contextOver.

The following minimal example prices two OpenAI models and one tiered Gemini model:

{
  "providers": {
    "openai": {
      "models": {
        "gpt-4o-mini": {
          "rates": { "input": "0.15", "output": "0.6", "cacheRead": "0.075" }
        }
      }
    },
    "gcp.gemini": {
      "models": {
        "gemini-2.5-pro": {
          "rates": { "input": "1.25", "output": "10", "cacheRead": "0.125" },
          "tiers": [
            {
              "contextOver": 200000,
              "rates": { "input": "2.5", "output": "15", "cacheRead": "0.25" }
            }
          ]
        }
      }
    }
  }
}

Generate a catalog with agctl

Use agctl costs import to generate a catalog JSON file, then reference that file from config.modelCatalog or MODEL_CATALOG_PATHS.

Generate a catalog from a supported source. By default, agctl costs import imports every provider that the proxy supports from models.dev.
```
agctl costs import --pretty --out ./catalog.json
```
To import only a subset of providers, pass a comma-separated list to --providers.
```
agctl costs import --pretty --providers openai,anthropic --out ./catalog.json
```
Reference the generated file from your configuration with config.modelCatalog[].file or MODEL_CATALOG_PATHS, then run agentgateway.

For all options, see the agctl costs import reference.

Step 2: Configure catalog sources

Configure one or more catalog sources for agentgateway with the config.modelCatalog config section. Sources are merged in order, with later sources taking precedence at the model level.

Load a catalog from a file

The file field is a path to a catalog JSON file. Agentgateway watches the file and reloads it when it changes.

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
config:
  modelCatalog:
  - file: ./catalog.json

Embed a catalog inline

The inline field is a string that contains the catalog JSON.

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
config:
  modelCatalog:
  - inline: |
      {
        "providers": {
          "openai": {
            "models": {
              "gpt-4o-mini": {
                "rates": { "input": "0.15", "output": "0.6", "cacheRead": "0.075" }
              }
            }
          }
        }
      }

Load catalog files with an environment variable

You can also load one or more catalog files with the MODEL_CATALOG_PATHS environment variable, set to a comma-separated list of file paths. The environment variable is useful for container deployments where you mount a catalog file and enable it without editing the main configuration file.

MODEL_CATALOG_PATHS=./catalog.json,./overrides.json agentgateway -f config.yaml

When MODEL_CATALOG_PATHS is set, it replaces any config.modelCatalog sources; the two are not merged. Use one mechanism or the other.

Step 3: Configure cost policies

Use cost data in CEL, logs, traces, and metrics policies.

When a request matches an entry in the catalog, agentgateway populates the following CEL fields:

llm.cost: The realized USD cost of the request. Includes total plus per-token-type components: input, output, cacheRead, cacheWrite, reasoning, inputAudio, and outputAudio. Unset when the model cannot be priced.
llm.costRates: The effective USD-per-1,000,000-token rates that were applied, after tier selection. Includes the same per-token-type fields when available. Unset when the model cannot be priced.

The request access log always includes agw.ai.usage.cost.total for LLM requests (it is 0 when the model cannot be priced). To add the breakdown or rate fields, reference them with CEL in access logs, traces, or metrics:

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
frontendPolicies:
  accessLog:
    add:
      llm.cost.total: 'llm.cost.total'
      llm.cost.input: 'llm.cost.input'
      llm.cost.output: 'llm.cost.output'
      llm.cost.cacheRead: 'llm.cost.cacheRead'
  tracing:
    attributes:
      llm.cost.total: 'llm.cost.total'
      llm.costRates.input: 'llm.costRates.input'
      llm.costRates.output: 'llm.costRates.output'

config:
  metrics:
    fields:
      add:
        llm.cost.total: 'llm.cost.total'
        llm.costRates.input: 'llm.costRates.input'

A priced request produces an access log line that includes the cost fields:

... protocol=llm gen_ai.provider.name=openai gen_ai.request.model=gpt-4o-mini
gen_ai.usage.input_tokens=14 gen_ai.usage.output_tokens=6 agw.ai.usage.cost.total=0.0000057 ...

For more examples, see Observe traffic and the CEL reference.

Step 4: Generate traffic

Generate traffic through agentgateway that matches a model entry from the catalog. For example steps, try the LLM getting started.

Step 5: Monitor catalog lookups

Every cost lookup increments the agentgateway_cost_catalog_lookups_total counter, labeled with the lookup status and the request’s gen_ai_system (provider), gen_ai_request_model, and gen_ai_response_model. Use the lookup to confirm that your catalog prices your traffic.

The status label is one of the following values:

Status	Meaning
`Exact`	The provider and model were found in the catalog and priced.
`Unpriced`	The model was found, but the token types in the request had no matching rates.
`Missing`	The provider or model was not found in the catalog.
`NoCatalog`	No catalog is configured.

For example, the metrics endpoint at http://localhost:15020/metrics shows lines such as the following:

agentgateway_cost_catalog_lookups_total{status=“Exact”,gen_ai_system=“openai”,gen_ai_request_model=“gpt-4o-mini”,…} 1 agentgateway_cost_catalog_lookups_total{status=“Missing”,gen_ai_system=“openai”,gen_ai_request_model=“gpt-3.5-turbo”,…} 1


A rising `Missing` or `Unpriced` count means requests are flowing through models that your catalog does not price. Add the missing providers or models to your catalog and reload.


  
    info
  
  In traces, the corresponding cost-resolution status attribute uses lowercase values: exact, unpriced, missing, and noCatalog.

Control spend Transform requests

Was this page helpful?