Virtual key management

Issue API keys with per-key token budgets and cost tracking (also known as virtual keys).

About

Virtual key management is a common feature in AI gateway solutions that allows you to issue API keys to users or applications, each with independent token budgets and cost tracking. Competitors like LiteLLM and Portkey offer this as a single “virtual keys” abstraction.

Agentgateway achieves the same outcome by composing three existing capabilities:

API key authentication: Identify incoming requests by API key
Token-based rate limiting: Enforce per-key token budgets
Observability metrics: Track per-key spending and usage

This composable approach gives you more flexibility in how you configure and apply virtual key management policies, while maintaining compatibility with standard Kubernetes patterns.

How virtual keys work

Virtual keys combine authentication, rate limiting, and observability to create isolated token budgets for each API key:

  flowchart TD
  A[Request arrives with API key] --> B[Validate API key]
  B --> C[Extract user ID]
  C --> D[Check user's token budget]
  D --> E{Budget available?}
  E -->|Yes| F[Forward to LLM]
  F --> G[Track token usage]
  G --> H[Deduct from budget]
  E -->|No| I[Reject with 429]
  subgraph refill["Budget refills periodically"]
    H
  end

When a request arrives:

Agentgateway validates the API key
The user ID is extracted from a request header
The request is checked against the user’s token budget
If budget is available, the request proceeds to the LLM
Token usage is tracked and deducted from the user’s budget
If budget is exhausted, the request is rejected with a 429 status code
Budgets refill at the configured interval (daily, hourly, etc.)

More considerations

Evaluation order: Rate limiting is evaluated before prompt guards (content safety checks). This means that requests rejected by guardrails (403 Forbidden) still consume quota from the user’s token budget. In contrast, authentication (JWT/OPA) is evaluated before rate limiting, so unauthenticated requests do not consume quota.

Multiple policies: When multiple AgentgatewayPolicy resources target the same Gateway or HTTPRoute with overlapping backend.ai fields, one policy silently overwrites the other based on creation order. Both policies will show ACCEPTED/ATTACHED status. To avoid conflicts, use separate policies for different configuration areas (such as one for authentication, one for rate limiting, one for prompt guards).

Before you begin

Set up an agentgateway proxy.
Set up access to the OpenAI LLM provider.

Set up virtual keys

This example creates two virtual keys (for Alice and Bob) with independent 100,000 token daily budgets.

Create API keys for users

Create API key secrets for each user. Each secret includes a label that references the key group for authentication.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: user-alice-key
  namespace: agentgateway-system
  labels:
    api-key-group: llm-users
type: extauth.solo.io/apikey
stringData:
  api-key: sk-alice-abc123def456
---
apiVersion: v1
kind: Secret
metadata:
  name: user-bob-key
  namespace: agentgateway-system
  labels:
    api-key-group: llm-users
type: extauth.solo.io/apikey
stringData:
  api-key: sk-bob-xyz789uvw012
EOF

Review the following table to understand this configuration.

Setting	Description
`type`	Set to `extauth.solo.io/apikey` to create API key secrets.
`labels.api-key-group`	Label to group API keys together for authentication policy selection.
`stringData.api-key`	The API key value that users include in their requests.

Configure API key authentication

Create a AgentgatewayPolicy that requires API key authentication for all requests to the gateway. The policy extracts the user ID from the X-User-ID header for use in rate limiting.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: api-key-auth
  namespace: agentgateway-system
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: agentgateway-proxy
  traffic:
    apiKeyAuthentication:
      mode: Strict
      secretSelector:
        matchLabels:
          api-key-group: llm-users
EOF

Review the following table to understand this configuration.

Setting	Description
`targetRefs`	Apply the policy to the entire Gateway so all routes require API keys.
`apiKeyAuthentication.mode`	Set to `Strict` to require a valid API key for all requests.
`secretSelector`	Use label selectors to reference all API key secrets with the `api-key-group: llm-users` label.

Configure per-key token budgets

Create a AgentgatewayPolicy that enforces a daily token budget of 100,000 tokens per user.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: daily-token-budget
  namespace: agentgateway-system
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: agentgateway-proxy
  traffic:
    rateLimit:
      global:
        domain: token-budgets
        backendRef:
          kind: Service
          name: rate-limit-server
          namespace: agentgateway-system
          port: 8081
        descriptors:
          - entries:
              - name: user_id
                expression: 'request.headers["x-user-id"]'
            unit: Tokens
EOF

Review the following table to understand this configuration.

Setting	Description
`rateLimit.global`	Use global rate limiting to enforce limits across all agentgateway instances.
`domain`	A namespace for rate limit configurations. Use `token-budgets` to organize your budget policies.
`backendRef`	References the rate limit server Service. Must include `kind`, `name`, `namespace`, and `port`.
`descriptors[].entries[].name`	The name of the descriptor entry. Set to `user_id` to rate limit per user.
`descriptors[].entries[].expression`	CEL expression to extract the user ID from the `X-User-ID` request header.
`descriptors[].unit`	Set to `Tokens` to enforce token-based limits instead of request-based limits.

Configure the rate limit server

Deploy a rate limit server and configure it with your budget limits. This guide uses global rate limiting to enforce per-key token budgets across multiple gateway instances. For more information, see the global rate limiting section in the LLM rate limiting guide.

Deploy the rate limit server. For setup instructions, see the global rate limiting section in the LLM rate limiting guide.

Create a ConfigMap with your budget configuration.

kubectl apply -f- <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: rate-limit-config
  namespace: agentgateway-system
data:
  config.yaml: |
    domain: token-budgets
    descriptors:
      - key: user_id
        rate_limit:
          unit: day
          requests_per_unit: 100000
EOF

Review the following table to understand this configuration.

Setting	Description
`domain`	Must match the domain in your AgentgatewayPolicy (`token-budgets`).
`descriptors[].key`	Must match the descriptor key (`user_id`).
`rate_limit.unit`	The time window for the budget. Use `day` for daily budgets. Other options: `second`, `minute`, `hour`.
`rate_limit.requests_per_unit`	The token budget. Set to 100,000 tokens per day. Since `type: tokens` is set, this counts tokens rather than requests.

Set up an LLM backend

Create an AgentgatewayBackend that connects to your LLM provider.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: openai
  namespace: agentgateway-system
spec:
  ai:
    provider:
      openai:
        model: gpt-3.5-turbo
  policies:
    auth:
      secretRef:
        name: openai-secret
EOF

For detailed instructions on creating backends and storing provider API keys, see the API keys guide.

Create a route to the backend

Create an HTTPRoute that routes requests to your LLM backend.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: openai
  namespace: agentgateway-system
spec:
  parentRefs:
    - name: agentgateway-proxy
      namespace: agentgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /openai
    backendRefs:
    - name: openai
      namespace: agentgateway-system
      group: agentgateway.dev
      kind: AgentgatewayBackend
EOF

Test the virtual keys

ℹ️

The following tests verify API key authentication and routing. For full end-to-end testing of per-key token budget enforcement, deploy a rate limit server as described in the global rate limiting section.

Send a request with Alice’s API key. Verify that the request succeeds.

curl "$INGRESS_GW_ADDRESS/openai" \
  -H "Authorization: Bearer sk-alice-abc123def456" \
  -H "X-User-ID: alice" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

curl "localhost:8080/openai" \
  -H "Authorization: Bearer sk-alice-abc123def456" \
  -H "X-User-ID: alice" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Example successful response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-3.5-turbo",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 9,
    "total_tokens": 19
  }
}

Send multiple requests until Alice’s 100,000 token budget is exhausted. Verify that subsequent requests are rejected with a 429 status code.
Example 429 response:
```
HTTP/1.1 429 Too Many Requests
x-ratelimit-limit: 100000
x-ratelimit-remaining: 0
x-ratelimit-reset: 43200

rate limit exceeded
```

Verify that Bob can still send requests with his own budget, independent of Alice’s usage.

curl "$INGRESS_GW_ADDRESS/openai" \
  -H "Authorization: Bearer sk-bob-xyz789uvw012" \
  -H "X-User-ID: bob" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

curl "localhost:8080/openai" \
  -H "Authorization: Bearer sk-bob-xyz789uvw012" \
  -H "X-User-ID: bob" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Bob’s requests succeed because he has his own independent budget.

Monitor per-key spending

Track token usage and spending for each virtual key using Prometheus metrics.

Port-forward the agentgateway proxy metrics endpoint.

kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 15020

Query token usage metrics filtered by user ID.

# Total tokens consumed by user over the last 24 hours
sum by (user_id) (
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) +
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h])
)

# Percentage of daily budget used
(sum by (user_id) (
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) +
  increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h])
) / 100000) * 100

Calculate costs per user by multiplying token counts by your provider’s pricing. For example, with OpenAI GPT-3.5:

# Cost per user (assuming $0.50 per 1M input tokens, $1.50 per 1M output tokens)
sum by (user_id) (
  ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) / 1000000) * 0.50) +
  ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h]) / 1000000) * 1.50)
)

For more information on cost tracking, see the cost tracking guide.

Advanced configuration

Tiered budgets based on user type

Provide different budget tiers for free, standard, and premium users.

Add a tier label to each API key secret.

apiVersion: v1
kind: Secret
metadata:
  name: user-alice-key
  namespace: agentgateway-system
  labels:
    api-key-group: llm-users
    tier: premium
type: extauth.solo.io/apikey
stringData:
  api-key: sk-alice-abc123def456
---
apiVersion: v1
kind: Secret
metadata:
  name: user-charlie-key
  namespace: agentgateway-system
  labels:
    api-key-group: llm-users
    tier: free
type: extauth.solo.io/apikey
stringData:
  api-key: sk-charlie-ghi345jkl678

Configure rate limiting to use the tier from a header.

traffic:
  rateLimit:
    global:
      domain: token-budgets
      backendRef:
        kind: Service
        name: rate-limit-server
        namespace: agentgateway-system
        port: 8081
      descriptors:
        - entries:
            - name: tier
              expression: 'request.headers["x-user-tier"]'
            - name: user_id
              expression: 'request.headers["x-user-id"]'
          unit: Tokens

Configure the rate limit server with tier-based budgets.

domain: token-budgets
descriptors:
  - key: tier
    value: "free"
    descriptors:
      - key: user_id
        rate_limit:
          unit: day
          requests_per_unit: 10000  # 10K tokens/day for free tier
  - key: tier
    value: "standard"
    descriptors:
      - key: user_id
        rate_limit:
          unit: day
          requests_per_unit: 100000  # 100K tokens/day for standard tier
  - key: tier
    value: "premium"
    descriptors:
      - key: user_id
        rate_limit:
          unit: day
          requests_per_unit: 500000  # 500K tokens/day for premium tier

Hourly budget limits

Set a smaller budget that refreshes every hour for tighter cost control.

# In rate-limit-config ConfigMap
domain: token-budgets
descriptors:
  - key: user_id
    rate_limit:
      unit: hour
      requests_per_unit: 10000  # 10,000 tokens per hour

Multi-tenant virtual keys

Create virtual keys scoped to both user and tenant for multi-tenant applications.

# In TrafficPolicy
descriptors:
  - entries:
      - name: tenant_id
        expression: 'request.headers["x-tenant-id"]'
      - name: user_id
        expression: 'request.headers["x-user-id"]'
    unit: Tokens

# In rate-limit-config ConfigMap
domain: token-budgets
descriptors:
  - key: tenant_id
    descriptors:
      - key: user_id
        rate_limit:
          unit: day
          requests_per_unit: 50000

For more advanced rate limiting patterns, see the budget and spend limits guide.

Cleanup

You can remove the resources that you created in this guide.

kubectl delete AgentgatewayPolicy api-key-auth daily-token-budget -n agentgateway-system
kubectl delete secret user-alice-key user-bob-key -n agentgateway-system
kubectl delete configmap rate-limit-config -n agentgateway-system
kubectl delete httproute openai -n agentgateway-system
kubectl delete AgentgatewayBackend openai -n agentgateway-system

What’s next

Manage API keys for detailed authentication configuration
Budget and spend limits for advanced rate limiting patterns
Track costs per request for cost calculation and monitoring
Set up observability to view token usage metrics and logs

Manage API keys Budget and spend limits

Virtual key management

About

How virtual keys work

More considerations

Before you begin

Set up virtual keys

Create API keys for users

Configure API key authentication

Configure per-key token budgets

Configure the rate limit server

Set up an LLM backend

Create a route to the backend

Test the virtual keys

Monitor per-key spending

Advanced configuration

Tiered budgets based on user type

Hourly budget limits

Multi-tenant virtual keys

Cleanup

What’s next

What could be improved?