Transform requests
Use LLM request transformations to dynamically compute and set fields in LLM requests using Common Expression Language (CEL) CEL (Common Expression Language) A simple expression language used throughout agentgateway to enable flexible configuration. CEL expressions can access request context, JWT claims, and other variables to make dynamic decisions. expressions. Transformations let you enforce policies such as capping token usage or conditionally modifying request parameters, without changing client code.
To learn more about CEL, see the following resources:
Before you begin
Configure LLM request transformations
Create an AgentgatewayPolicy resource to apply an LLM request transformation. The following example caps
max_tokensto 10, regardless of what the client requests.kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayPolicy metadata: name: cap-max-tokens namespace: agentgateway-system labels: app: agentgateway spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: openai backend: ai: transformations: - field: max_tokens expression: "min(llmRequest.max_tokens, 10)" EOFSetting Description backend.ai.transformationsA list of LLM request field transformations. fieldThe name of the LLM request field to set. Maximum 256 characters. expressionA CEL expression that computes the value for the field. Use the llmRequestvariable to access the original LLM request body. Maximum 16,384 characters.ℹ️You can specify up to 64 transformations per policy. Transformations take priority over
overridesfor the same field. If an expression fails to evaluate, the field is silently removed from the request.Thinking budget fields, such as
reasoning_effortandthinking_budget_tokenscan also be set or capped by using transformations. This way, operators can enforce reasoning limits centrally without requiring client changes. For example, use"field": "reasoning_effort"with the expression"medium"to cap all requests to medium reasoning efforts regardless of what the client sends.Send a request with
max_tokensset to a value greater than 10. The transformation caps it to 10 before the request reaches the LLM provider. Verify that thecompletion_tokensvalue in the response is 10 or fewer, the response is capped and thefinish_reasonis set tolength.curl "$INGRESS_GW_ADDRESS/openai" \ -H "content-type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "max_tokens": 5000, "messages": [ { "role": "user", "content": "Tell me a short story" } ] }' | jqcurl "localhost:8080/openai" \ -H "content-type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "max_tokens": 5000, "messages": [ { "role": "user", "content": "Tell me a short story" } ] }' | jqExample output:
{ "model": "gpt-3.5-turbo-0125", "usage": { "prompt_tokens": 12, "completion_tokens": 10, "total_tokens": 22, "completion_tokens_details": { "reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 }, "prompt_tokens_details": { "cached_tokens": 0, "audio_tokens": 0 } }, "choices": [ { "message": { "content": "Once upon a time, in a small village nestled", "role": "assistant", "refusal": null, "annotations": [] }, "index": 0, "logprobs": null, "finish_reason": "length" } ], ... }
Cleanup
You can remove the resources that you created in this guide.kubectl delete AgentgatewayPolicy -n agentgateway-system -l app=agentgateway