Routing and fallback¶
Routing is where LunarGate becomes useful. You can keep one stable client integration and decide at the gateway which upstream provider or model should serve a request.
Routing model¶
A route matches a request and produces a list of targets.
routing:
default_strategy: weighted
routes:
- name: force-provider-openai
match:
path: /v1/chat/completions
headers:
x-lunargate-provider: openai
targets:
- provider: openai
model: gpt-5-nano
weight: 100
fallback:
- provider: deepseek
model: deepseek-chat
weight: 100
What can be matched¶
- Request path
- Request headers
A common pattern is to route by team, environment, complexity tier, or a forced provider header.
The request path is also the cleanest way to separate:
- chat traffic on
/v1/chat/completions - embeddings traffic on
/v1/embeddings
Load-balancing strategy¶
The current default strategy is weighted balancing. Each target gets a weight, and the gateway selects between eligible targets accordingly.
Fallback behavior¶
If the primary target fails and the failure is retryable or terminal for that provider, the gateway can continue through the fallback chain.
Tool-aware routing¶
When a request includes tools or tool_choice, LunarGate can inject x-lunargate-requires-tools: true. That lets you steer tool-using requests to models that actually support tools.
Complexity-based routing¶
The config can score requests and emit headers such as:
x-lunargate-complexityx-lunargate-complexity-scorex-lunargate-skill
Those headers can be used as route match inputs, which makes autorouting configurable rather than hardcoded.
Things to keep in mind¶
Tip
Put the more specific routes first. Header-based force routes and tool-capability routes should appear before general default routes.
Tip
If chat and embeddings use different upstream models, create separate routes for each path instead of trying to force both through one generic route.
Warning
If you reference a provider or model that is not configured, the route can match but still fail at execution time.