Skip to content

providers

The providers section defines every upstream the gateway can call.

Provider config is hot-reloadable. Updating API keys, base URLs, default models, timeout behavior, or model-discovery settings rebuilds the in-process provider registry without restarting the gateway, as long as the new config still contains at least one valid provider.

Provider IDs vs provider types

Each key under providers: is a provider ID that you reference later in routing.

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"

In that example:

  • provider ID = openai
  • provider type = inferred as openai

If the provider ID is not one of the built-in IDs, you must set type explicitly.

Supported provider types today

Type Notes
openai Works for OpenAI and other OpenAI-compatible APIs
anthropic Native Anthropic request/stream translation
ollama Local Ollama upstream

Built-in IDs that can omit type are:

  • openai
  • anthropic
  • ollama

For custom IDs like deepseek, abacus, gemini, groq, together, openrouter, or a second OpenAI account, set type: "openai" explicitly.

Fields

Field Type Required Notes
type string sometimes required for non-built-in provider IDs
api_key string usually used for hosted providers
base_url string optional required for custom OpenAI-compatible or Ollama upstreams
default_model string optional used when routing target omits model and also exposed in model listing
temperature number optional provider-level default sampling temperature (request value still wins)
top_p number optional provider-level default nucleus sampling value (request value still wins)
top_k integer optional provider-level default top_k (currently used by Ollama translator)
organization string optional sent as OpenAI-Organization for OpenAI-compatible fetches
api_version string optional config field exists for provider-specific usage
timeout duration optional upstream timeout for this provider; defaults to 120s
timeout_mode string optional ttft or total; defaults to ttft
compatibility_profile string optional grouped OpenAI-compatible upstream quirks such as deepseek
normalize_developer_role bool optional rewrites upstream developer messages to system before marshaling
extra map[string]string optional extra provider-specific config bag
models.mode string optional translator, static, or fetch
models.static list[string] optional explicit model list when mode: static
models.fetch.ttl duration optional cache TTL for fetched model lists; defaults to 10m

Provider-level sampling defaults

You can define sampling defaults once per provider and still override them per request.

Precedence is:

  1. request payload (temperature, top_p, top_k)
  2. provider defaults in providers.<id>
  3. upstream/model native defaults
providers:
  ollama:
    type: "ollama"
    base_url: "http://127.0.0.1:11434"
    default_model: "gemma4:26b"
    temperature: 1.0
    top_p: 0.95
    top_k: 64

  openai:
    type: "openai"
    api_key: "${OPENAI_API_KEY}"
    default_model: "gpt-5.2"
    temperature: 1.0
    top_p: 0.95

  anthropic:
    type: "anthropic"
    api_key: "${ANTHROPIC_API_KEY}"
    default_model: "claude-sonnet-4-5"
    temperature: 1.0
    top_p: 0.95

Notes:

  • temperature and top_p provider defaults are applied by the openai, anthropic, and ollama translators.
  • top_k provider defaults are currently applied by the ollama translator.
  • These fields are hot-reloadable like other provider settings.

Upstream timeout behavior

timeout is configured per provider, not globally.

providers:
  ollama:
    base_url: "http://127.0.0.1:11434"
    default_model: "qwen3.5"
    timeout: 10m
    timeout_mode: ttft

timeout_mode: ttft (default)

ttft means "time to first token" / first byte from the upstream response body.

The gateway gives the provider up to timeout to start responding. Once the first response byte arrives, the timeout is considered satisfied and the rest of the response can continue without being cut off by that provider timeout.

This is the recommended mode for:

  • slow local Ollama models
  • long-running streaming responses
  • upstreams that can take a long time to start but then stream steadily

timeout_mode: total

total means the full upstream response, from request start until the last byte of the body, must complete within timeout.

Use this when you want a hard upper bound on total upstream duration, even if the provider has already started streaming.

Notes

  • If timeout_mode is omitted, LunarGate uses ttft.
  • last_byte is accepted as an alias for total.
  • The provider timeout and retry are separate concerns: timeout limits one upstream attempt, while retry controls whether another attempt should be made afterward.

Common patterns

OpenAI hosted

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    default_model: "gpt-5.2"

OpenAI-compatible custom upstream

providers:
  deepseek:
    type: "openai"
    api_key: "${DEEPSEEK_API_KEY}"
    base_url: "https://api.deepseek.com/v1"
    default_model: "deepseek-chat"
    compatibility_profile: "deepseek"

OpenAI-compatible compatibility toggles

Use compatibility_profile for grouped upstream quirks and normalize_developer_role for the explicit role rewrite toggle.

providers:
  deepseek:
    type: "openai"
    api_key: "${DEEPSEEK_API_KEY}"
    base_url: "https://api.deepseek.com/v1"
    default_model: "deepseek-chat"
    compatibility_profile: "deepseek"
    normalize_developer_role: true

Notes:

  • compatibility_profile: "deepseek" automatically enables normalize_developer_role.
  • normalize_developer_role: true rewrites upstream developer messages to system before request marshaling.
  • This is useful for OpenAI-compatible upstreams that reject the OpenAI developer role in /chat/completions or /responses.

Gemini via the OpenAI-compatible endpoint

providers:
  gemini:
    type: "openai"
    api_key: "${GEMINI_API_KEY}"
    base_url: "https://generativelanguage.googleapis.com/v1beta/openai"
    default_model: "gemini-2.5-flash"

Groq via the OpenAI-compatible endpoint

providers:
  groq:
    type: "openai"
    api_key: "${GROQ_API_KEY}"
    base_url: "https://api.groq.com/openai/v1"
    default_model: "llama-3.3-70b-versatile"

Together via the OpenAI-compatible endpoint

providers:
  together:
    type: "openai"
    api_key: "${TOGETHER_API_KEY}"
    base_url: "https://api.together.xyz/v1"
    default_model: "openai/gpt-oss-20b"

OpenRouter via the OpenAI-compatible endpoint

providers:
  openrouter:
    type: "openai"
    api_key: "${OPENROUTER_API_KEY}"
    base_url: "https://openrouter.ai/api/v1"
    default_model: "openai/gpt-4o"

Anthropic

providers:
  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"
    default_model: "claude-sonnet-4-5"

Ollama

providers:
  ollama:
    base_url: "http://127.0.0.1:11434"
    default_model: "qwen3.5"
    timeout: 10m
    timeout_mode: ttft

Model discovery modes

translator (default)

The gateway asks the provider translator for its built-in model list and also includes default_model if you set one.

Use this when you want simple config and predictable behavior.

static

You define the visible model list yourself.

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    default_model: "gpt-5.2"
    models:
      mode: "static"
      static:
        - "gpt-5.2"
        - "gpt-5.2-mini"

fetch

The gateway fetches model IDs from the upstream and caches them.

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    base_url: "https://api.openai.com/v1"
    models:
      mode: "fetch"
      fetch:
        ttl: 15m

Current runtime support for fetch:

  • OpenAI-compatible upstreams via GET /models
  • Ollama via GET /api/tags

If fetching fails, the gateway falls back to the translator/default-model path.

Practical guidance

  • Keep provider IDs stable. Routing refers to IDs, not provider types.
  • Use type: "openai" for any OpenAI-compatible provider that is not literally named openai.
  • If you expose models dynamically via fetch, make sure base_url is correct and reachable from the gateway.
  • Prefer timeout_mode: ttft for slow local inference backends such as Ollama unless you explicitly need a hard total-response deadline.
  • Embeddings support depends on the upstream, not just the provider type. OpenAI-compatible providers can expose /v1/embeddings, but you still need an embeddings-capable upstream model and route.