The "Unlimited Agents" Era Is Ending — and It's About to Create Agent Debt

16 Apr 2026

John Rowell

[

CEO, Co-founder

]

The "Unlimited Agents" Era Is Ending — and It's About to Create Agent Debt

The all-you-can-eat AI subscription buffet is closing. And if your team runs autonomous agents, the economics you planned around no longer exist.

Effective April 4, Anthropic told Claude Pro and Max subscribers that their subscription limits no longer cover third-party agent harnesses like OpenClaw. Those workloads now move to pay-as-you-go API billing.

Anthropic's stated reason was capacity: third-party harnesses are not optimized for compute efficiency, and sustaining that usage pattern on flat-rate plans is "really hard for us to do sustainably."

OpenAI made the same call at nearly the same moment, shifting Codex in ChatGPT Business to usage-based, token-billed pricing.

Two vendors, same week, same direction.

Subscription pricing hid the true cost of agents. Usage pricing exposes it immediately.

It was built for humans in a UI. Agentic automation runs differently, consumes resources differently, and flat-rate plans were never designed to sustain it. Vendors are now pricing accordingly.

The Iceberg You Didn't See Coming

The difference with agents is not the cost. It's how the cost accumulates.

When a person uses an AI tool, they send a prompt and get an answer. The token cost is visible, bounded, and tied to a human decision. When an agent runs, it plans, loops, calls tools, accumulates context, and retries on failure. Each step generates tokens and downstream costs, including API calls, tool invocations, and human-in-the-loop checks, that never appear on the same line as the original task.

Jensen Huang put a number on it at GTC this year: continuous background agents can generate up to a million times the tokens of a standard prompt.

The math is simple. At roughly $5 per million tokens, a single agent can run about $300 per day, around $100,000 a year, on frontier models without cost controls.

Jason Calacanis described it on All-In: *"I'm spending $300 a day per agent... $100,000 a year, just for one."* Without token budgets and real limits, spend compounds faster than most teams expect.

That's the Iceberg Effect. Tokens are the visible tip. Below the waterline: tool and API costs, downstream service calls, retries, human escalations, context that keeps growing with every turn.

Most enterprises have no visibility into any of it because their subscription bill doesn't itemize it. It just appears as a lump sum, or not at all until someone pulls the invoice.

The problem isn't that agents are expensive. The problem is that nobody knows how expensive they are until after the fact.

The Iceberg Effect: the LLM bill is what you can see at the top— tool calls and downstream services are the real cost.

How Agent Debt Forms

I've spent 25 years building infrastructure businesses across cloud hosting, managed services, and data centers, and the consistent lesson is that invisible costs do the most damage.

We're watching that pattern repeat with agents. We call it Agent Debt.

Agent Debt is what happens when an autonomous workflow runs without an owner, a pre-execution budget, or a stop condition. Nobody decided to take it on. Unlike technical debt, which compounds quietly in codebases, this kind shows up on a CFO's desk.

Here's how it forms:

No owner. Nobody is accountable for whether the workflow still creates value. The agent keeps running because nobody told it to stop.

No budget. There's no ceiling set before execution. Spend grows with usage, not with ROI. A workflow that cost $500 in testing costs $15,000 in production because the context got longer and the tool calls multiplied.

No stop condition. Without a defined exit, agents retry, loop, and escalate until something external stops them, usually a billing alert that went to the wrong inbox.

No attribution. When the bill comes, nobody can trace it to a workflow, an outcome, or a business decision. It's just a number that nobody owns.

Deloitte's 2026 State of AI in the Enterprise survey found governance maturity lagging behind deployment at most organizations, and the top risk cited was agents doing the wrong thing.

Worth extending that: agents doing the right thing at twice the cost it should have taken, with nobody watching the meter, produces the same problem with a different invoice.

The market is starting to respond with AI gateways, proxy layers that log per-request tokens, cost, and latency. Visibility matters, and that's a real step. But a gateway is a monitoring tool, not a governance layer. Knowing the cost after the fact doesn't change whether the workflow should have run.

Token caps have the same limitation. They prevent runaway spend. They don't tell you whether the work produced anything worth the cost.

What Economic Governance Actually Means

The word governance gets used loosely in AI discussions, so it's worth being specific about what it actually requires.

Governance isn't a dashboard or a token budget set in a config file. It's enforceable policy at execution before the agent runs its first tool call, before context starts accumulating.

In practice, it comes down to four things:

A budget owner. Every agent workflow has a specific owner, not just a team, who can answer whether it's worth running.

A pre-execution budget. The spend ceiling is defined before execution and enforced at execution, not on a monthly invoice.

Attribution. Every tool call, token, and downstream cost is linked to the workflow outcome that generated it. When the bill arrives, it traces back to a decision.

Enforcement. Policy violations stop or reroute execution in real time, not just trigger notifications.

Routing matters. If roughly 70% of an agent’s tokens are mechanical work, formatting, extraction, classification, you’re paying frontier model prices for tasks smaller models handle just as well.

Routing those sub-tasks to cheaper models can cut blended cost 60 to 80 percent without changing outcomes.

Routing without attribution only tells you spend went down. It doesn’t tell you which workflows created value or which should have been shut down weeks ago.

For engineering teams, the clearest frame is observability and control across AI workloads, the same pattern applied to every infrastructure layer over the past decade. Govern the workload and the cost becomes legible.

Connect cost to outcomes and you have something worth optimizing. Identity governance without economic governance leaves enterprises with half the picture.

The Gap Between an Experiment and a Production Agent

The era of unlimited gave teams room to experiment and learn what agents could actually do. That mattered. Agent Debt is what happens when that experimentation carries over into production without the accountability structure production requires.

What separates a successful experiment from a production-ready agent isn't capability. It's the ownership, policy, and attribution built around it before it goes live.

Every production workflow needs a named person, not a team, who approved it, knows what it costs, and decides whether it keeps running. Without that person, the agent runs by default rather than by decision. The spend ceiling needs to be enforced at the point of execution, not discovered on an invoice. And every tool call, token, and downstream cost needs to trace back to the workflow and business outcome it was meant to serve, because without that connection, every scaling decision is a guess.

Metered pricing didn’t create this problem. It exposed it.

Every agent is now a financial decision, whether anyone made that decision or not. The difference between a useful system and a runaway cost center is whether ownership, budgets, and attribution were defined before the first run.

Most teams will discover what their agents cost after the invoice. The ones that win won’t.

Table of Contents

What Is FinOps for AI?

Ship With Confidence