Skip to content

Architecture

System overview

LunarGate is a self-hosted AI gateway that keeps your applications on one stable API while routing traffic to multiple providers.

High-level flow

Client apps -> LunarGate Gateway -> OpenAI / Anthropic / Gemini / Groq / Together / OpenRouter / Abacus / DeepSeek / Ollama
                         |
                         +-> Prometheus metrics
                         +-> Optional Dashboard collector for metrics and request logs
                         +-> Optional Dashboard remote-control channel

One-sentence summary: the gateway sits between your app and upstream LLMs, keeps the client protocol stable, and moves policy decisions into config instead of application code.

Core responsibilities

Request normalization

The public API is OpenAI-compatible. The gateway accepts chat-completions style payloads and normalizes known client quirks before handing the request to a provider translator.

Routing

Routing is config-driven. A route can match on request path and headers, then send the request to one or more targets using weighted balancing.

Resilience

Each upstream call can go through retry logic, per-provider circuit breakers, and a fallback chain.

Observability

The gateway exposes Prometheus metrics locally. It can also send metrics-only or full request-log data to the LunarGate Dashboard on app.lunargate.ai, depending on the data_sharing settings.

Remote control

The same data_sharing section can also attach the gateway to an outbound control channel for the LunarGate Dashboard on app.lunargate.ai.

Today that channel is mainly used for sandbox features. In the longer term it is meant to support things like automated A/B tests, controlled experiments, and other remote operations against a connected gateway.

Request lifecycle

1. Request arrives
2. Gateway loads current config snapshot
3. Optional inbound auth is validated (`security.provider: api_key`)
4. Rate limiting and cache rules are applied
5. Matching route is selected
6. Target is picked by balancing strategy
7. Provider translator builds the upstream request
8. Gateway executes retries and fallback if needed
9. Response is normalized to the OpenAI-compatible shape
10. Metrics are recorded
11. Optional collector payload is sent to the LunarGate Dashboard on `app.lunargate.ai`

Hot reload model

Configuration is watched on disk. When the YAML file changes, LunarGate reloads the parsed config and reconciles the running components in memory.

Today that means you can update all of the following without restarting the process:

  • provider definitions used by translators and model discovery
  • routing rules and load-balancing strategy
  • retry policy
  • cache settings
  • rate limiting
  • inbound auth settings
  • model-selection behavior
  • data-sharing / collector behavior
  • remote-control enablement and identity details

What still needs a restart:

  • listener address and port
  • server read/write/idle timeouts

The important practical point is that hot reload now covers real upstream/provider changes, not only route weights and a few lightweight knobs.

Current constraints

  • Inbound client authentication currently supports config-defined API keys only; external auth backends are still a TODO.
  • Cache and rate limiting are in-memory only.
  • Server bind address and HTTP timeout changes still require a process restart.
  • The gateway speaks HTTP and SSE, not gRPC.
  1. Read Routing and fallback if you want to understand how this architecture turns into request decisions.
  2. Read Observability and data sharing if you want to understand what stays local and what can be exported.
  3. Open Configuration overview when you are ready to map these concepts into YAML.