Architecture¶

System overview

LunarGate is a self-hosted AI gateway that keeps your applications on one stable API while routing traffic to multiple providers.

Need the shortest path first?
Start with the quickstart page if you want to run the gateway before diving into architecture.

Need the routing mental model?
Jump to routing and fallback once the high-level flow here makes sense.

Need the YAML map?
Use the configuration overview when you are ready to translate the architecture into real config.

High-level flow¶

Client apps -> LunarGate Gateway -> OpenAI / Anthropic / Gemini / Groq / Together / OpenRouter / Abacus / DeepSeek / Ollama
                         |
                         +-> Prometheus metrics
                         +-> Optional Dashboard collector for metrics and request logs
                         +-> Optional Dashboard remote-control channel

One-sentence summary: the gateway sits between your app and upstream LLMs, keeps the client protocol stable, and moves policy decisions into config instead of application code.

Core responsibilities¶

Request normalization¶

The public API is OpenAI-compatible. The gateway accepts chat-completions style payloads and normalizes known client quirks before handing the request to a provider translator.

Routing¶

Routing is config-driven. A route can match on request path and headers, then send the request to one or more targets using weighted balancing.

Resilience¶

Each upstream call can go through retry logic, per-provider circuit breakers, and a fallback chain.

Observability¶

The gateway exposes Prometheus metrics locally. It can also send metrics-only or full request-log data to the LunarGate Dashboard on app.lunargate.ai, depending on the data_sharing settings.

Remote control¶

The same data_sharing section can also attach the gateway to an outbound control channel for the LunarGate Dashboard on app.lunargate.ai.

Today that channel is mainly used for sandbox features. In the longer term it is meant to support things like automated A/B tests, controlled experiments, and other remote operations against a connected gateway.

Request lifecycle¶

1. Request arrives
2. Gateway loads current config snapshot
3. Optional inbound auth is validated (`security.provider: api_key`)
4. Rate limiting and cache rules are applied
5. Matching route is selected
6. Target is picked by balancing strategy
7. Provider translator builds the upstream request
8. Gateway executes retries and fallback if needed
9. Response is normalized to the OpenAI-compatible shape
10. Metrics are recorded
11. Optional collector payload is sent to the LunarGate Dashboard on `app.lunargate.ai`

Hot reload model¶

Configuration is watched on disk. When the YAML file changes, LunarGate reloads the parsed config and reconciles the running components in memory.

Today that means you can update all of the following without restarting the process:

provider definitions used by translators and model discovery
routing rules and load-balancing strategy
retry policy
cache settings
rate limiting
inbound auth settings
model-selection behavior
data-sharing / collector behavior
remote-control enablement and identity details

What still needs a restart:

listener address and port
server read/write/idle timeouts

The important practical point is that hot reload now covers real upstream/provider changes, not only route weights and a few lightweight knobs.

Current constraints¶

Inbound client authentication currently supports config-defined API keys only; external auth backends are still a TODO.
Cache and rate limiting are in-memory only.
Server bind address and HTTP timeout changes still require a process restart.
The gateway speaks HTTP and SSE, not gRPC.

What to read next¶

Read Routing and fallback if you want to understand how this architecture turns into request decisions.
Read Observability and data sharing if you want to understand what stays local and what can be exported.
Open Configuration overview when you are ready to map these concepts into YAML.