LunarGate Gateway¶
Run one OpenAI-compatible endpoint in your infrastructure and route requests across multiple LLM providers with fallback, retries, caching, hot-reloadable config, and optional observability export.
Why teams use it¶
- One endpoint for every app
Keep your app code on the OpenAI API shape and swap providers behind the gateway.
- Resilience built in
Route by headers, retry transient failures, and cascade to fallback targets automatically.
- Operated from config
Change routing rules, rate limits, and provider weights without rebuilding the binary.
- Observability without lock-in
Export metrics only by default, or opt into prompt and response sharing for request inspection.
The request path¶
Request -> Auth edge -> Rate Limit -> Cache -> Route Match -> Load Balance
-> Retry -> Circuit Breaker -> Provider Translation -> LLM Call
-> Response Translation -> Metrics -> Optional Data Sharing -> Response
Start in the mode you need¶
What is in scope today¶
- OpenAI-compatible
POST /v1/chat/completions - Model listing via
GET /v1/models - Health and metrics endpoints
- Multi-provider routing and fallback
- In-memory rate limiting and caching
- Hot-reloadable YAML config
- Optional SaaS observability export
Important security note¶
Warning
The gateway currently does not implement inbound client authentication. Run it inside a trusted network or behind an auth-enforcing edge such as an API gateway, reverse proxy, or service mesh.
Documentation map¶
- Start with Quickstart if you want the fastest path to a running gateway.
- Use Docker Compose if you are working in the full LunarGate local stack.
- Read Routing and fallback to understand how traffic decisions are made.
- Read Observability and data sharing before enabling prompt or response export.
- Keep Configuration open while editing YAML.