Skip to content
🎯 New workshop: Govern AI Costs in Real Time — Hands-On with agentgateway agentgateway has joined the Agentic AI FoundationLearn more

For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.

Page as Markdown

Virtual models

Configure virtual models with weighted, failover, and conditional routing in simplified LLM mode.

Virtual models let you publish one client-facing model name and route requests across one or more internal target models.

Use llm.virtualModels[] to define the virtual entrypoint and llm.models[] as the concrete upstream targets.

Public and internal models

Use llm.models[].visibility to control whether a model is directly exposed to clients or kept as an internal target.

  • public: The model can be requested directly by clients and can also be used as a virtual model target.
  • internal: The model is intended for internal routing targets and is not exposed as a direct client model.

Route selection modes

Each virtual model defines its routing strategy under routing. The routing targets in a virtual model point to concrete llm.models[] entries.

Weighted routing

Use routing.weighted.targets to split traffic between targets with weight.

llm:
  models:
  - name: gpt-4o-public
    visibility: public
    provider: openAI
    params:
      model: gpt-4o
      apiKey: "$OPENAI_API_KEY"
  - name: gpt-4o-primary
    visibility: internal
    provider: openAI
    params:
      model: gpt-4o
      apiKey: "$OPENAI_API_KEY"
  - name: gpt-4o-fallback
    visibility: internal
    provider: openAI
    params:
      model: gpt-4o-mini
      apiKey: "$OPENAI_API_KEY"

  virtualModels:
  - name: smart
    routing:
      weighted:
        targets:
        - model: gpt-4o-primary
          weight: 90
        - model: gpt-4o-fallback
          weight: 10

Failover routing

Use routing.failover.targets and priority to define ordered failover targets. Targets with the same priority are load balanced across based on health and latency.

llm:
  models:
  - name: claude-primary
    visibility: internal
    provider: anthropic
    params:
      model: claude-sonnet-4-0
      apiKey: "$ANTHROPIC_API_KEY"
  - name: claude-backup-a
    visibility: internal
    provider: anthropic
    params:
      model: claude-3-5-haiku-20241022
      apiKey: "$ANTHROPIC_API_KEY"
  - name: claude-backup-b
    visibility: internal
    provider: anthropic
    params:
      model: claude-3-5-haiku-20241022
      apiKey: "$ANTHROPIC_API_KEY"

  virtualModels:
  - name: resilient
    routing:
      failover:
        targets:
        - model: claude-primary
          priority: 1
        - model: claude-backup-a
          priority: 2
        - model: claude-backup-b
          priority: 2

Conditional routing

Use routing.conditional.targets and when expressions to select targets by request context.

llm:
  models:
  - name: openai-public
    visibility: public
    provider: openAI
    params:
      model: gpt-4o-mini
      apiKey: "$OPENAI_API_KEY"
  - name: openai-fast
    visibility: internal
    provider: openAI
    params:
      model: gpt-4o-mini
      apiKey: "$OPENAI_API_KEY"
  - name: openai-smart
    visibility: internal
    provider: openAI
    params:
      model: gpt-4o
      apiKey: "$OPENAI_API_KEY"

  virtualModels:
  - name: adaptive
    routing:
      conditional:
        targets:
        - model: openai-fast
          when: request.headers["x-tier"] == "free"
        - model: openai-smart
          when: request.headers["x-tier"] == "pro"
For reusable provider defaults in simplified mode, see Multiple LLM providers.
Was this page helpful?
Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.