providers¶
The providers section defines every upstream the gateway can call.
Provider config is hot-reloadable. Updating API keys, base URLs, default models, timeout behavior, or model-discovery settings rebuilds the in-process provider registry without restarting the gateway, as long as the new config still contains at least one valid provider.
Provider IDs vs provider types¶
Each key under providers: is a provider ID that you reference later in routing.
In that example:
- provider ID =
openai - provider type = inferred as
openai
If the provider ID is not one of the built-in IDs, you must set type explicitly.
Supported provider types today¶
| Type | Notes |
|---|---|
openai |
Works for OpenAI and other OpenAI-compatible APIs |
anthropic |
Native Anthropic request/stream translation |
ollama |
Local Ollama upstream |
Built-in IDs that can omit type are:
openaianthropicollama
For custom IDs like deepseek, abacus, gemini, groq, together, openrouter, or a second OpenAI account, set type: "openai" explicitly.
Fields¶
| Field | Type | Required | Notes |
|---|---|---|---|
type |
string | sometimes | required for non-built-in provider IDs |
api_key |
string | usually | used for hosted providers |
base_url |
string | optional | required for custom OpenAI-compatible or Ollama upstreams |
default_model |
string | optional | used when routing target omits model and also exposed in model listing |
temperature |
number | optional | provider-level default sampling temperature (request value still wins) |
top_p |
number | optional | provider-level default nucleus sampling value (request value still wins) |
top_k |
integer | optional | provider-level default top_k (currently used by Ollama translator) |
organization |
string | optional | sent as OpenAI-Organization for OpenAI-compatible fetches |
api_version |
string | optional | config field exists for provider-specific usage |
timeout |
duration | optional | upstream timeout for this provider; defaults to 120s |
timeout_mode |
string | optional | ttft or total; defaults to ttft |
compatibility_profile |
string | optional | grouped OpenAI-compatible upstream quirks such as deepseek |
normalize_developer_role |
bool | optional | rewrites upstream developer messages to system before marshaling |
extra |
map[string]string | optional | extra provider-specific config bag |
models.mode |
string | optional | translator, static, or fetch |
models.static |
list[string] | optional | explicit model list when mode: static |
models.fetch.ttl |
duration | optional | cache TTL for fetched model lists; defaults to 10m |
Provider-level sampling defaults¶
You can define sampling defaults once per provider and still override them per request.
Precedence is:
- request payload (
temperature,top_p,top_k) - provider defaults in
providers.<id> - upstream/model native defaults
providers:
ollama:
type: "ollama"
base_url: "http://127.0.0.1:11434"
default_model: "gemma4:26b"
temperature: 1.0
top_p: 0.95
top_k: 64
openai:
type: "openai"
api_key: "${OPENAI_API_KEY}"
default_model: "gpt-5.2"
temperature: 1.0
top_p: 0.95
anthropic:
type: "anthropic"
api_key: "${ANTHROPIC_API_KEY}"
default_model: "claude-sonnet-4-5"
temperature: 1.0
top_p: 0.95
Notes:
temperatureandtop_pprovider defaults are applied by theopenai,anthropic, andollamatranslators.top_kprovider defaults are currently applied by theollamatranslator.- These fields are hot-reloadable like other provider settings.
Upstream timeout behavior¶
timeout is configured per provider, not globally.
providers:
ollama:
base_url: "http://127.0.0.1:11434"
default_model: "qwen3.5"
timeout: 10m
timeout_mode: ttft
timeout_mode: ttft (default)¶
ttft means "time to first token" / first byte from the upstream response body.
The gateway gives the provider up to timeout to start responding. Once the first response byte arrives, the timeout is considered satisfied and the rest of the response can continue without being cut off by that provider timeout.
This is the recommended mode for:
- slow local Ollama models
- long-running streaming responses
- upstreams that can take a long time to start but then stream steadily
timeout_mode: total¶
total means the full upstream response, from request start until the last byte of the body, must complete within timeout.
Use this when you want a hard upper bound on total upstream duration, even if the provider has already started streaming.
Notes¶
- If
timeout_modeis omitted, LunarGate usesttft. last_byteis accepted as an alias fortotal.- The provider timeout and
retryare separate concerns: timeout limits one upstream attempt, whileretrycontrols whether another attempt should be made afterward.
Common patterns¶
OpenAI hosted¶
OpenAI-compatible custom upstream¶
providers:
deepseek:
type: "openai"
api_key: "${DEEPSEEK_API_KEY}"
base_url: "https://api.deepseek.com/v1"
default_model: "deepseek-chat"
compatibility_profile: "deepseek"
OpenAI-compatible compatibility toggles¶
Use compatibility_profile for grouped upstream quirks and normalize_developer_role for the explicit role rewrite toggle.
providers:
deepseek:
type: "openai"
api_key: "${DEEPSEEK_API_KEY}"
base_url: "https://api.deepseek.com/v1"
default_model: "deepseek-chat"
compatibility_profile: "deepseek"
normalize_developer_role: true
Notes:
compatibility_profile: "deepseek"automatically enablesnormalize_developer_role.normalize_developer_role: truerewrites upstreamdevelopermessages tosystembefore request marshaling.- This is useful for OpenAI-compatible upstreams that reject the OpenAI
developerrole in/chat/completionsor/responses.
Gemini via the OpenAI-compatible endpoint¶
providers:
gemini:
type: "openai"
api_key: "${GEMINI_API_KEY}"
base_url: "https://generativelanguage.googleapis.com/v1beta/openai"
default_model: "gemini-2.5-flash"
Groq via the OpenAI-compatible endpoint¶
providers:
groq:
type: "openai"
api_key: "${GROQ_API_KEY}"
base_url: "https://api.groq.com/openai/v1"
default_model: "llama-3.3-70b-versatile"
Together via the OpenAI-compatible endpoint¶
providers:
together:
type: "openai"
api_key: "${TOGETHER_API_KEY}"
base_url: "https://api.together.xyz/v1"
default_model: "openai/gpt-oss-20b"
OpenRouter via the OpenAI-compatible endpoint¶
providers:
openrouter:
type: "openai"
api_key: "${OPENROUTER_API_KEY}"
base_url: "https://openrouter.ai/api/v1"
default_model: "openai/gpt-4o"
Anthropic¶
Ollama¶
providers:
ollama:
base_url: "http://127.0.0.1:11434"
default_model: "qwen3.5"
timeout: 10m
timeout_mode: ttft
Model discovery modes¶
translator (default)¶
The gateway asks the provider translator for its built-in model list and also includes default_model if you set one.
Use this when you want simple config and predictable behavior.
static¶
You define the visible model list yourself.
providers:
openai:
api_key: "${OPENAI_API_KEY}"
default_model: "gpt-5.2"
models:
mode: "static"
static:
- "gpt-5.2"
- "gpt-5.2-mini"
fetch¶
The gateway fetches model IDs from the upstream and caches them.
providers:
openai:
api_key: "${OPENAI_API_KEY}"
base_url: "https://api.openai.com/v1"
models:
mode: "fetch"
fetch:
ttl: 15m
Current runtime support for fetch:
- OpenAI-compatible upstreams via
GET /models - Ollama via
GET /api/tags
If fetching fails, the gateway falls back to the translator/default-model path.
Practical guidance¶
- Keep provider IDs stable. Routing refers to IDs, not provider types.
- Use
type: "openai"for any OpenAI-compatible provider that is not literally namedopenai. - If you expose models dynamically via
fetch, make surebase_urlis correct and reachable from the gateway. - Prefer
timeout_mode: ttftfor slow local inference backends such as Ollama unless you explicitly need a hard total-response deadline. - Embeddings support depends on the upstream, not just the provider type. OpenAI-compatible providers can expose
/v1/embeddings, but you still need an embeddings-capable upstream model and route.