`providers`¶

The providers section defines every upstream the gateway can call.

Provider config is hot-reloadable. Updating API keys, base URLs, default models, timeout behavior, or model-discovery settings rebuilds the in-process provider registry without restarting the gateway, as long as the new config still contains at least one valid provider.

Provider IDs vs provider types¶

Each key under providers: is a provider ID that you reference later in routing.

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"

In that example:

provider ID = openai
provider type = inferred as openai

If the provider ID is not one of the built-in IDs, you must set type explicitly.

Supported provider types today¶

Type	Notes
`openai`	Works for OpenAI and other OpenAI-compatible APIs
`anthropic`	Native Anthropic request/stream translation
`ollama`	Local Ollama upstream

Built-in IDs that can omit type are:

openai
anthropic
ollama

For custom IDs like deepseek, abacus, gemini, groq, together, openrouter, or a second OpenAI account, set type: "openai" explicitly.

Fields¶

Field	Type	Required	Notes
`type`	string	sometimes	required for non-built-in provider IDs
`api_key`	string	usually	used for hosted providers
`base_url`	string	optional	required for custom OpenAI-compatible or Ollama upstreams
`default_model`	string	optional	used when routing target omits model and also exposed in model listing
`temperature`	number	optional	provider-level default sampling temperature (request value still wins)
`top_p`	number	optional	provider-level default nucleus sampling value (request value still wins)
`top_k`	integer	optional	provider-level default `top_k` (currently used by Ollama translator)
`organization`	string	optional	sent as `OpenAI-Organization` for OpenAI-compatible fetches
`api_version`	string	optional	config field exists for provider-specific usage
`timeout`	duration	optional	upstream timeout for this provider; defaults to `120s`
`timeout_mode`	string	optional	`ttft` or `total`; defaults to `ttft`
`compatibility_profile`	string	optional	grouped OpenAI-compatible upstream quirks such as `deepseek`
`normalize_developer_role`	bool	optional	rewrites upstream `developer` messages to `system` before marshaling
`extra`	map[string]string	optional	extra provider-specific config bag
`models.mode`	string	optional	`translator`, `static`, or `fetch`
`models.static`	list[string]	optional	explicit model list when `mode: static`
`models.fetch.ttl`	duration	optional	cache TTL for fetched model lists; defaults to `10m`

Provider-level sampling defaults¶

You can define sampling defaults once per provider and still override them per request.

Precedence is:

request payload (temperature, top_p, top_k)
provider defaults in providers.<id>
upstream/model native defaults

providers:
  ollama:
    type: "ollama"
    base_url: "http://127.0.0.1:11434"
    default_model: "gemma4:26b"
    temperature: 1.0
    top_p: 0.95
    top_k: 64

  openai:
    type: "openai"
    api_key: "${OPENAI_API_KEY}"
    default_model: "gpt-5.2"
    temperature: 1.0
    top_p: 0.95

  anthropic:
    type: "anthropic"
    api_key: "${ANTHROPIC_API_KEY}"
    default_model: "claude-sonnet-4-5"
    temperature: 1.0
    top_p: 0.95

Notes:

temperature and top_p provider defaults are applied by the openai, anthropic, and ollama translators.
top_k provider defaults are currently applied by the ollama translator.
These fields are hot-reloadable like other provider settings.

Upstream timeout behavior¶

timeout is configured per provider, not globally.

providers:
  ollama:
    base_url: "http://127.0.0.1:11434"
    default_model: "qwen3.5"
    timeout: 10m
    timeout_mode: ttft

`timeout_mode: ttft` (default)¶

ttft means "time to first token" / first byte from the upstream response body.

The gateway gives the provider up to timeout to start responding. Once the first response byte arrives, the timeout is considered satisfied and the rest of the response can continue without being cut off by that provider timeout.

This is the recommended mode for:

slow local Ollama models
long-running streaming responses
upstreams that can take a long time to start but then stream steadily

`timeout_mode: total`¶

total means the full upstream response, from request start until the last byte of the body, must complete within timeout.

Use this when you want a hard upper bound on total upstream duration, even if the provider has already started streaming.

Notes¶

If timeout_mode is omitted, LunarGate uses ttft.
last_byte is accepted as an alias for total.
The provider timeout and retry are separate concerns: timeout limits one upstream attempt, while retry controls whether another attempt should be made afterward.

Common patterns¶

OpenAI hosted¶

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    default_model: "gpt-5.2"

OpenAI-compatible custom upstream¶

providers:
  deepseek:
    type: "openai"
    api_key: "${DEEPSEEK_API_KEY}"
    base_url: "https://api.deepseek.com/v1"
    default_model: "deepseek-chat"
    compatibility_profile: "deepseek"

OpenAI-compatible compatibility toggles¶

Use compatibility_profile for grouped upstream quirks and normalize_developer_role for the explicit role rewrite toggle.

providers:
  deepseek:
    type: "openai"
    api_key: "${DEEPSEEK_API_KEY}"
    base_url: "https://api.deepseek.com/v1"
    default_model: "deepseek-chat"
    compatibility_profile: "deepseek"
    normalize_developer_role: true

Notes:

compatibility_profile: "deepseek" automatically enables normalize_developer_role.
normalize_developer_role: true rewrites upstream developer messages to system before request marshaling.
This is useful for OpenAI-compatible upstreams that reject the OpenAI developer role in /chat/completions or /responses.

Gemini via the OpenAI-compatible endpoint¶

providers:
  gemini:
    type: "openai"
    api_key: "${GEMINI_API_KEY}"
    base_url: "https://generativelanguage.googleapis.com/v1beta/openai"
    default_model: "gemini-2.5-flash"

Groq via the OpenAI-compatible endpoint¶

providers:
  groq:
    type: "openai"
    api_key: "${GROQ_API_KEY}"
    base_url: "https://api.groq.com/openai/v1"
    default_model: "llama-3.3-70b-versatile"

Together via the OpenAI-compatible endpoint¶

providers:
  together:
    type: "openai"
    api_key: "${TOGETHER_API_KEY}"
    base_url: "https://api.together.xyz/v1"
    default_model: "openai/gpt-oss-20b"

OpenRouter via the OpenAI-compatible endpoint¶

providers:
  openrouter:
    type: "openai"
    api_key: "${OPENROUTER_API_KEY}"
    base_url: "https://openrouter.ai/api/v1"
    default_model: "openai/gpt-4o"

Anthropic¶

providers:
  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"
    default_model: "claude-sonnet-4-5"

Ollama¶

providers:
  ollama:
    base_url: "http://127.0.0.1:11434"
    default_model: "qwen3.5"
    timeout: 10m
    timeout_mode: ttft

Model discovery modes¶

`translator` (default)¶

The gateway asks the provider translator for its built-in model list and also includes default_model if you set one.

Use this when you want simple config and predictable behavior.

`static`¶

You define the visible model list yourself.

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    default_model: "gpt-5.2"
    models:
      mode: "static"
      static:
        - "gpt-5.2"
        - "gpt-5.2-mini"

`fetch`¶

The gateway fetches model IDs from the upstream and caches them.

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    base_url: "https://api.openai.com/v1"
    models:
      mode: "fetch"
      fetch:
        ttl: 15m

Current runtime support for fetch:

OpenAI-compatible upstreams via GET /models
Ollama via GET /api/tags

If fetching fails, the gateway falls back to the translator/default-model path.

Practical guidance¶

Keep provider IDs stable. Routing refers to IDs, not provider types.
Use type: "openai" for any OpenAI-compatible provider that is not literally named openai.
If you expose models dynamically via fetch, make sure base_url is correct and reachable from the gateway.
Prefer timeout_mode: ttft for slow local inference backends such as Ollama unless you explicitly need a hard total-response deadline.
Embeddings support depends on the upstream, not just the provider type. OpenAI-compatible providers can expose /v1/embeddings, but you still need an embeddings-capable upstream model and route.

providers¶