Skip to content

LunarGate Gateway

Self-hosted AI gateway

Run one OpenAI-compatible endpoint in your infrastructure and route requests across multiple LLM providers with fallback, retries, caching, hot-reloadable config, and optional observability export.

Best entry path:

Why teams use it

  • One endpoint for every app

Keep your app code on the OpenAI API shape and swap providers behind the gateway.

  • Resilience built in

Route by headers, retry transient failures, and cascade to fallback targets automatically.

  • Operated from config

Change providers, routing, retry/cache behavior, rate limits, model selection, and Dashboard export settings without rebuilding the binary.

  • Observability without lock-in

Export metrics only by default, or opt into prompt and response sharing for request inspection.

The request path

Request -> Optional Inbound Auth (API key) -> Rate Limit -> Cache -> Route Match -> Load Balance
        -> Retry -> Circuit Breaker -> Provider Translation -> LLM Call
        -> Response Translation -> Metrics -> Optional Data Sharing -> Response

Where to actually start:

What is in scope today

  • OpenAI-compatible POST /v1/chat/completions
  • Model listing via GET /v1/models
  • Health and metrics endpoints
  • Multi-provider routing and fallback
  • In-memory rate limiting and caching
  • Hot-reloadable YAML config
  • Optional Dashboard observability export

Important security note

Warning

The gateway now supports basic inbound API-key authentication, but it is still safest to run it inside a trusted network or behind an auth-enforcing edge such as an API gateway, reverse proxy, or service mesh.

Documentation map

  1. Start with Quickstart for the fastest install -> config -> run -> client flow.
  2. Go to Examples overview for runnable Python, Node, Streamlit, and Docker Compose apps based on gateway-examples/.
  3. Read lunargate/auto and autorouting if you want the gateway to choose model tiers from one stable client model.
  4. Use Routing and fallback for route ordering, fallback chains, and load-balancing strategy.
  5. Keep Configuration overview and the detailed config pages open while editing YAML.
  6. Read Observability and data sharing before enabling prompt or response export.