Cross-provider · cache-aware routing

Cut the cost of running AI.

Multi-turn AI resends the same context every turn. We route it so the cache works in your favour, across every provider you already use.

Talk to us →How it works ↓

No commitment. We model a sample of your traffic and show the saving first.

The outcome

Watch the line item shrink.

The repeated context in a long session is the majority of what you pay for. Route each session across your providers to where the cache is cheapest, and the same workload costs less at equal quality.

Illustrative. Your measured traffic replaces every figure.

Invoice · monthlyfrontier APIs

Multi-turn inference · cache-aware routedwas $148,200

$148,200

/ month, after routing

Saved this month$0

Illustrative: a 30% net saving on the 60% of spend that is multi-turn and cacheable, from published cache pricing.

Why now

The way people use AI changed. The bill changed with it.

Models got good enough to run for minutes and hours at a time.

Agents take many steps. Coding tools loop. Assistants hold long conversations. Every step resends the accumulated context: the system prompt, the history, the tools, the retrieved documents.

The tokens per session climb fast, and most of them are the same context, paid for again and again. The model bill now behaves like cost of goods: large, variable, and growing faster than revenue. It eats the gross margin software usually keeps.

Six months ago this barely registered, when sessions were short and the resent context was small. The repeated context is the majority of the tokens now, and that is where the saving sits.

The mechanism

Three ways the number comes down.

Illustrative split. Your own traffic sets the proportions.

Model mix

The turns that do not need the frontier model go to a cheaper one, inside a spend mix you set. Quality stays where it earns its price.

discount + quality

Fewer switches

Every switch throws away the warm cache and pays to rebuild it on the next provider. We group turns and switch rarely, spending the expensive model early where it shapes the session.

keep cache warm

Across your providers

Each provider caches differently. We route each session across the providers you already use to where it runs cheapest, so more of your spend lands on cheap cache reads.

across providers

The economics

90%

Off the context you resend every turn.

On Anthropic, a cache read bills at about a tenth of a normal input token: close to a 90% discount on the context you would otherwise resend in full, every turn. The write is a small premium, 1.25× input for five minutes and 2× for an hour, paid once and read back cheaply after.

Each provider caches differently. We route each session to where it pays off most for you, tuned to your traffic.

Source: Anthropic published prompt-cache pricing, verified June 2026.

The product

You pick the point on the frontier.

For a given workload we plot the cost and quality of the routing options. The efficient ones form a frontier, and you choose your point on it: hold more quality, or set a tighter budget. The router stays inside the envelope you set.

We fit the frontier to a sample of your traffic, so the trade you see is your own.

See your saving

See what you would save.

Move the slider for a rough sense of the saving from published cache pricing, then we measure it on a sample of your traffic and show you the figure before any commitment.

Your monthly frontier spend$150,000

Estimated monthly saving$27,000

Illustrative, from published cache pricing. We replace it with the figure measured on your traffic.

No number is final until we have measured it on a sample of your own traffic.