caching¶
The caching section enables a simple in-memory exact-match response cache.
Fields¶
| Field | Type | Default | Notes |
|---|---|---|---|
enabled |
bool | false |
master switch |
ttl |
duration | 1h |
cache entry lifetime |
max_size |
integer | 1000 |
max number of cached entries |
Example¶
What kind of cache this is¶
This is not semantic caching. It is a straightforward in-memory cache keyed from request identity/content.
Use it when:
- prompts repeat often
- responses are deterministic enough for your use case
- a single gateway instance handling repeated traffic is enough
Practical guidance¶
- Keep it off if you are still validating routing behavior and want every request to hit the provider.
- Use
X-LunarGate-No-Cache: truefor request-level cache bypass when needed. - Remember that this cache is process-local and disappears on restart.