Release notes
Review the release notes for agentgateway standalone.
🔥 Breaking changes
New release version pattern
The previous release version pattern was changed to align the version number pattern for agentgateway standalone and agentgateway on Kubernetes. Going forward, both projects use the same release version number. If you have existing CI/CD workflows that depend on the old pattern, update them.
CEL 2.0
This release includes a major refactor to the CEL implementation in agentgateway that brings substantial performance improvements and enhanced functionality. Individual CEL expressions are now 5-500x faster, which has improved end-to-end proxy performance by 50%+ in some tests. For more details on the performance improvements, see this blog post on CEL optimization.
The following user-facing changes were introduced:
- Function name changes: For compatibility with the CEL-Go implementation, the
base64Encodeandbase64Decodefunctions now use dot notation:base64.encodeandbase64.decode. The old camel case names remain in place for backwards compatibility. - New string functions: The following string manipulation functions were added to the CEL library:
startsWith,endsWith,stripPrefix, andstripSuffix. These functions align with the Google CEL-Go strings extension. - Null values fail: If a top-level variable returns a null value, the CEL expression now fails. Previously, null values always returned true. For example, the
has(jwt)expression was previously successful if the JWT was missing or could not be found. Now, this expression fails. - Logical operators: Logical
||and&&operators now handle evaluation errors gracefully instead of propagating them. For example,a || breturnstrueifais true even ifberrors. Previously, the CEL expression failed.
Make sure to update and verify any existing CEL expressions that you use in your environment.
For more information, see the CEL expression reference.
External auth fail-closed
External auth policies now fail closed by default when the auth server is unreachable. This means requests are denied if the external authorization service cannot be reached. You are affected if you have an extAuthz policy configured and the auth service becomes unavailable.
Before: If the external auth service was unreachable, the behavior was undefined and requests could pass through.
After: Requests protected by an external auth policy are rejected with a failure response until the auth service is reachable.
To explicitly allow requests when the auth service is unavailable, set the failureMode to allow:
extAuthz:
host: localhost:9000
failureMode: allow
protocol:
grpc: {}For more information, see External authorization.
MCP deny-only authorization policies
A critical correctness bug was fixed in MCP authorization. You are affected if you have an MCP authorization policy that uses deny rules without any corresponding allow rules.
For example, review the following configuration. Previously, this policy denied all tool access, not just access to the echo tool. Starting in 1.0.0, only echo is denied and all other tools are allowed.
mcpAuthorization:
rules:
- deny: 'mcp.tool.name == "echo"'MCP authentication mode change
The default MCP authentication mode now defaults to strict mode instead of permissive. Requests to MCP backends without valid credentials are rejected by default. To restore the permissive behavior, set the mode field in your MCP authentication configuration:
mcpAuthentication:
mode: permissive
issuer: http://localhost:9000
jwks:
url: http://localhost:9000/.well-known/jwks.jsonFor more information, see MCP authentication.
🌟 New features
The following features were introduced in 1.0.0.
Simplified LLM configuration
A new top-level llm configuration section provides a simplified way to configure LLM providers. Instead of setting up the full binds, listeners, routes, and backends hierarchy, you can now define models directly in a flat structure. The simplified format defaults to port 4000.
The following example configures an OpenAI provider with a wildcard model match:
# yaml-language-server: $schema=https://agentgateway.dev/schema/config
llm:
models:
- name: "*"
provider: openAI
params:
apiKey: "$OPENAI_API_KEY"| Setting | Description |
|---|---|
name | The model name to match in incoming requests. When a client sends "model": "<name>", the request is routed to this provider. Use * to match any model name. |
provider | The LLM provider to use, such as openAI, anthropic, bedrock, gemini, or vertex. |
params.model | The model name sent to the upstream provider. If set, this overrides the model from the request. If not set, the model from the request is passed through. |
params.apiKey | The API key for authentication. You can reference environment variables using the $VAR_NAME syntax. |
You can also define model aliases to decouple client-facing model names from provider-specific identifiers:
llm:
models:
- name: fast
provider: openAI
params:
model: gpt-4o-mini
apiKey: "$OPENAI_API_KEY"
- name: smart
provider: openAI
params:
model: gpt-4o
apiKey: "$OPENAI_API_KEY"Policies such as rate limiting and authentication can be set at the llm level to apply to all models:
llm:
policies:
localRateLimit:
- maxTokens: 10
tokensPerFill: 1
fillInterval: 60s
type: tokens
models:
- name: "*"
provider: openAI
params:
apiKey: "$OPENAI_API_KEY"The traditional route-based configuration (binds/listeners/routes) remains fully supported for advanced use cases that require path-based routing or custom endpoints.
For more information, see the provider setup guides such as OpenAI, Anthropic, and Bedrock.
LLM request transformations
You can now use CEL expressions to dynamically compute and set fields in LLM requests. This allows you to enforce policies, such as capping token usage, without changing client code.
The following example caps max_tokens to 10 for all requests:
binds:
- port: 3000
listeners:
- routes:
- backends:
- ai:
name: openai
provider:
openAI:
model: gpt-3.5-turbo
policies:
backendAuth:
key: "$OPENAI_API_KEY"
ai:
transformations:
max_tokens: "min(llmRequest.max_tokens, 10)"For more information, see Transform requests.
Extended thinking and structured outputs for Claude providers
Extended thinking and structured outputs are now supported for Anthropic and Amazon Bedrock Claude providers.
Extended thinking lets Claude reason through complex problems before generating a response. Thinking is opt-in. You must provide specific attributes in your request to enable extended thinking.
Structured outputs constrain the model to respond with a specific JSON schema. You define the JSON schema as part of your request.
For more information, see the following resources:
Remote URL support for OpenAPI schemas
You can now load an OpenAPI schema from a remote URL by setting the url field in the schema section of your OpenAPI target configuration. Agentgateway fetches the schema at startup. Previously, only local file paths and inline schemas were supported.
openapi:
schema:
url: https://example.com/api/openapi.json
host: example.comFor more information, see Connect to an OpenAPI server.
Remote rate limit failure modes
You can now configure how agentgateway behaves when the remote rate limit service is unavailable using the new failureMode field. The default behavior is failClosed, which denies requests with a 500 status code. Set failureMode to failOpen to allow requests through when the service is unreachable.
remoteRateLimit:
host: localhost:9090
domain: example.com
failureMode: failOpen
descriptors:
- entries:
- key: organization
value: 'request.headers["x-organization"]'
type: requestsFor more information, see Failure behavior.
JWT claim validation for MCP auth
You can now customize which JWT claims must be present in a token before it is accepted, using the new jwtValidationOptions.requiredClaims field in your MCP authentication configuration.
mcpAuthentication:
issuer: http://localhost:9000
jwks:
url: http://localhost:9000/.well-known/jwks.json
jwtValidationOptions:
requiredClaims:
- exp
- aud
- subFor more information, see JWT claim validation.
🪲 Bug fixes
MCP per-request policy evaluation
MCP policies are now re-evaluated on each request rather than only at session start. If an operator updates an authorization policy, such as by revoking access to a tool or changing JWT claim requirements, the change takes effect immediately on the next request, without requiring the client to tear down and re-establish the MCP session.
Note that this is a behavioral improvement. Existing MCP authorization configuration benefits automatically.
CORS evaluation ordering
CORS evaluation now runs before authentication and before rate limiting. Previously, CORS ran after auth and rate limiting, which caused two problems:
- Browser preflight OPTIONS requests were rejected by auth, making cross-origin requests impossible when auth was enabled
- Rate-limited 429 responses lacked CORS headers, so browsers saw an opaque CORS error instead of a retryable one
Note that this is a behavioral improvement. Existing configurations that combine CORS policies with extauth and rate limiting policies now work correctly.