HTTP API¶

Endpoints¶

`POST /v1/chat/completions`¶

OpenAI-compatible chat completions endpoint.

`POST /v1/responses`¶

OpenAI-compatible Responses API endpoint (regular JSON and SSE streaming).

`POST /v1/embeddings`¶

OpenAI-compatible embeddings endpoint.

`GET /v1/responses`¶

OpenAI-compatible WebSocket mode for Responses API.

`GET /v1/models`¶

Returns the discovered and configured models visible through the gateway.

`GET /v1/models/{model}`¶

Returns details for a single discovered/configured model.

`GET /health`¶

Container and process health endpoint.

`GET /ready`¶

Readiness endpoint for orchestration and probes.

`GET /metrics`¶

Prometheus metrics scrape endpoint.

Example chat completion request¶

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [
      {"role": "user", "content": "Hello from LunarGate"}
    ]
  }'

Example embeddings request¶

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/nomic-embed-text-v2-moe",
    "input": [
      "LunarGate can proxy embeddings requests.",
      "Embeddings are useful for semantic search."
    ]
  }'

Custom request headers¶

Header	Description
`X-LunarGate-Provider`	Force a specific provider
`X-LunarGate-Model`	Override the model
`X-LunarGate-Route`	Force a named route
`X-LunarGate-SessionID`	Session correlation identifier used in request metadata/logs
`X-LunarGate-No-Cache`	Bypass cache when set to `true`
`X-LunarGate-No-Retry`	Disable retries when set to `true`

Response headers¶

Header	Description
`X-LunarGate-Request-ID`	Unique request identifier
`X-LunarGate-Provider`	Provider that served the request
`X-LunarGate-Model`	Model used for the request
`X-LunarGate-Route`	Route that matched
`X-LunarGate-Cache-Status`	`HIT` or `MISS`
`X-LunarGate-Latency-Ms`	End-to-end latency in milliseconds
`X-LunarGate-Overhead-Duration-Ms`	Gateway overhead timing header

Streaming¶

The gateway supports SSE streaming on chat completions and Responses endpoints.

Example streaming request:

curl -N http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.2",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Write a short haiku about LunarGate."}
    ]
}'

Responses SSE request:

curl -N http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.2",
    "stream": true,
    "input": "Write a short haiku about LunarGate."
  }'

Responses WebSocket mode¶

Use GET /v1/responses with a WebSocket client and send response.create frames.

Gateway converts each response.create frame into a Responses request and always streams events back.
Events are sent as JSON WebSocket messages (response.created, response.output_text.delta, response.completed, and error events).
previous_response_id is validated against response IDs created earlier on the same WebSocket connection.
One request is processed at a time per connection.
If x-lunargate-sessionid is missing on the WebSocket handshake, gateway generates one automatically (wsresp_<uuid>).
The same session ID is injected into each upstream request created from response.create frames, so collector/request logs can correlate multiple upstream requests from one WS session.

Example with wscat:

wscat -c ws://localhost:8080/v1/responses

Then send:

{"type":"response.create","model":"openai/gpt-5.2","input":"Say hello from LunarGate"}

Compatibility notes¶

LunarGate normalizes some client payload variants before routing to upstream providers. That helps preserve OpenAI compatibility even when upstream or intermediate clients serialize text content differently.

For embeddings specifically:

the public endpoint is POST /v1/embeddings
a common routing pattern is to match /v1/embeddings separately from /v1/chat/completions
local Ollama is a good smoke-test target for embeddings before building retrieval or RAG flows

HTTP API¶

Endpoints¶

POST /v1/chat/completions¶

POST /v1/responses¶

POST /v1/embeddings¶

GET /v1/responses¶

GET /v1/models¶

GET /v1/models/{model}¶

GET /health¶

GET /ready¶

GET /metrics¶