Transform requests

Use LLM request transformations to dynamically compute and set fields in LLM requests using Common Expression Language (CEL) CEL (Common Expression Language) A simple expression language used throughout agentgateway to enable flexible configuration. CEL expressions can access request context, JWT claims, and other variables to make dynamic decisions. expressions. Transformations let you enforce policies such as capping token usage or conditionally modifying request parameters, without changing client code.

To learn more about CEL, see the following resources:

Before you begin

Set up an agentgateway proxy.
Set up access to the OpenAI LLM provider.

Configure LLM request transformations

Create an AgentgatewayPolicy resource to apply an LLM request transformation. The following example caps max_tokens to 10, regardless of what the client requests.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: cap-max-tokens
  namespace: agentgateway-system
  labels:
    app: agentgateway
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: openai
  backend:
    ai:
      transformations:
      - field: max_tokens
        expression: "min(llmRequest.max_tokens, 10)"
EOF

Setting	Description
`backend.ai.transformations`	A list of LLM request field transformations.
`field`	The name of the LLM request field to set. Maximum 256 characters.
`expression`	A CEL expression that computes the value for the field. Use the `llmRequest` variable to access the original LLM request body. Maximum 16,384 characters.

ℹ️

You can specify up to 64 transformations per policy. Transformations take priority over overrides for the same field. If an expression fails to evaluate, the field is silently removed from the request.

Thinking budget fields, such as reasoning_effort and thinking_budget_tokens can also be set or capped by using transformations. This way, operators can enforce reasoning limits centrally without requiring client changes. For example, use "field": "reasoning_effort" with the expression "medium" to cap all requests to medium reasoning efforts regardless of what the client sends.

Send a request with max_tokens set to a value greater than 10. The transformation caps it to 10 before the request reaches the LLM provider. Verify that the completion_tokens value in the response is 10 or fewer, the response is capped and the finish_reason is set to length.

curl "$INGRESS_GW_ADDRESS/openai" \
-H "content-type: application/json" \
-d '{
  "model": "gpt-3.5-turbo",
  "max_tokens": 5000,
  "messages": [
    {
      "role": "user",
      "content": "Tell me a short story"
    }
  ]
}' | jq

curl "localhost:8080/openai" \
-H "content-type: application/json" \
-d '{
  "model": "gpt-3.5-turbo",
  "max_tokens": 5000,
  "messages": [
    {
      "role": "user",
      "content": "Tell me a short story"
    }
  ]
}' | jq

Example output:

{
  "model": "gpt-3.5-turbo-0125",
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 10,
    "total_tokens": 22,
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    }
  },
  "choices": [
    {
      "message": {
        "content": "Once upon a time, in a small village nestled",
        "role": "assistant",
        "refusal": null,
        "annotations": []
      },
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  ...
}

Cleanup

You can remove the resources that you created in this guide.

kubectl delete AgentgatewayPolicy -n agentgateway-system -l app=agentgateway

Prompt templates CEL-based RBAC

Transform requests

Before you begin

Configure LLM request transformations

Cleanup

What could be improved?