About six months ago the way people use AI changed. Agents started running for many steps. Coding tools began to loop over a task. Assistants held conversations that lasted for hours. The unit of work stopped being a single message and became a session. With that shift, the model API bill became a cost of goods: variable, large, and growing faster than revenue.
For a software business that is an uncomfortable line on the page. Software usually enjoys high gross margins because the cost of serving one more customer is small. When the model bill scales with usage and climbs every quarter, it eats the difference. The margin software is supposed to keep ends up flowing to the provider instead.
Why the bill grew
The growth has a mechanical cause. Every step of a session resends the accumulated context: the system prompt, the conversation history, the tool definitions, the retrieved documents. A single answer might cost very little. A session that runs for forty steps pays for that context forty times, and the context gets longer as the session goes on. Tokens per session climb fast, and the bill climbs with them.
Most of those tokens are the same tokens. The system prompt does not change between turns. The documents in scope rarely change. The early history is fixed the moment it is written. A product that loops is, for the most part, paying to send identical context again and again.
Why it matters to margin
When a cost grows faster than the revenue it supports, it sets the ceiling on the business. A team can win more customers and still watch margin slip, because each new customer brings more sessions, and each session brings more resent context. The pricing page stays the same while the cost underneath it rises.
The encouraging part is that the cost has a clear shape. It is repeated context, billed in full on every turn, and the repetition is exactly what a cache is built for. Once you can see the bill as a stable base resent over and over, the question becomes practical: how do you stop paying full price for the part that never changes. That is where the saving lives, and it is the work the rest of these posts are about.