Routing and fallback¶

Routing is where LunarGate becomes useful. You can keep one stable client integration and decide at the gateway which upstream provider or model should serve a request.

Routing model¶

A route matches a request and produces a list of targets.

routing:
  default_strategy: weighted
  routes:
    - name: force-provider-openai
      match:
        path: /v1/chat/completions
        headers:
          x-lunargate-provider: openai
      targets:
        - provider: openai
          model: gpt-5-nano
          weight: 100
      fallback:
        - provider: deepseek
          model: deepseek-chat
          weight: 100

What can be matched¶

Request path
Request headers

A common pattern is to route by team, environment, complexity tier, or a forced provider header.

The request path is also the cleanest way to separate:

chat traffic on /v1/chat/completions
embeddings traffic on /v1/embeddings

Load-balancing strategy¶

The current default strategy is weighted balancing. Each target gets a weight, and the gateway selects between eligible targets accordingly.

Fallback behavior¶

If the primary target fails and the failure is retryable or terminal for that provider, the gateway can continue through the fallback chain.

Primary target -> retries -> fallback 1 -> fallback 2 -> error

Tool-aware routing¶

When a request includes tools or tool_choice, LunarGate can inject x-lunargate-requires-tools: true. That lets you steer tool-using requests to models that actually support tools.

Complexity-based routing¶

The config can score requests and emit headers such as:

x-lunargate-complexity
x-lunargate-complexity-score
x-lunargate-skill

Those headers can be used as route match inputs, which makes autorouting configurable rather than hardcoded.

Things to keep in mind¶

Tip

Put the more specific routes first. Header-based force routes and tool-capability routes should appear before general default routes.

Tip

If chat and embeddings use different upstream models, create separate routes for each path instead of trying to force both through one generic route.

Warning

If you reference a provider or model that is not configured, the route can match but still fail at execution time.