Skip to content

caching

The caching section enables a simple in-memory exact-match response cache.

Fields

Field Type Default Notes
enabled bool false master switch
ttl duration 1h cache entry lifetime
max_size integer 1000 max number of cached entries

Example

caching:
  enabled: true
  ttl: 30m
  max_size: 5000

What kind of cache this is

This is not semantic caching. It is a straightforward in-memory cache keyed from request identity/content.

Use it when:

  • prompts repeat often
  • responses are deterministic enough for your use case
  • a single gateway instance handling repeated traffic is enough

Practical guidance

  • Keep it off if you are still validating routing behavior and want every request to hit the provider.
  • Use X-LunarGate-No-Cache: true for request-level cache bypass when needed.
  • Remember that this cache is process-local and disappears on restart.