Where There’s Mystery, There’s Margin

09 Dec 2025
Bailey Caldwell
[
]
Share
Where There’s Mystery, There’s Margin

Fresh off AWS re-Invent 2025, 60,000 practitioners heard Matt Garman announce that "AI agents are starting to give way to agentic systems." Let me tell you what wasn’t said on stage but is lurking beneath the keynote spectacle: cloud providers have built empires on billing complexity, and agentic AI is about to repeat the same strategy that telecom pioneered and cloud computing perfected.

It’s a familiar pattern.

The Telecom Playbook- Profitable Confusion

I sort of remember 1990s cell phone bills, I certainly had one. Mysterious line items, "government recovery fees" that weren't government fees, and per-minute charges that multiplied inexplicably. Overages were massive and and charged by the minute, there was no realtime interface to your consumption or bill, it just showed up and you screamed. The formula was elegant- complexity breeds confusion, confusion breeds acceptance, acceptance breeds profit. When customers can't decode bills, they stop trying. When finance can't reconcile spend, they build in buffer. Where there's mystery, there's margin.

The Cloud War Story- Spontaneous Bill Combustion

Fast forward to cloud, and history remixes the same track. The horror stories are legendary- misconfigured S3 buckets costing $2.3M, Lambda functions gone rogue hitting six figures, forgotten dev environments running 24/7 as "yacht payments". David Linthicum predicted cloud costs would run 2.5x projections and was right. He notes 81% of companies blew past cloud budgets in 2024.

Enter FinOps. Founded in 2019 because practitioners were drowning in provider-generated chaos with nowhere to compare notes. The Linux Foundation formalized it as a discipline in 2020 because finance, engineering, and procurement needed a common framework to manage cloud costs that were spiraling out of control.

FOCUS- Admitting There's a Problem

Then came FOCUS, the FinOps Open Cost and Usage Specification, launched June 2024 after years of wrangling. Think of it as the industry admitting that AWS, Azure, and GCP speaking different billing languages wasn't a feature, it was a margin-protecting bug. The fact it took until 2024 to get unified billing format tells you everything about provider investment in maintaining opacity. While support is building for FOCUS and it is trying to also include SaaS - I wonder how it will support the coming explosion of AI costs?

Opacity 2.0- How LLM Providers Inherited Clouds Billing Playbook

If you thought cloud providers pioneered opacity, wait until you route an agentic AI workflow across OpenAI, Anthropic, and Google's LLMs simultaneously. Each provider speaks a different pricing language. Each one discovered that complexity protects margin.

AWS Bedrock offers three inference tiers per API call- Priority (75-100% premium, guaranteed low latency), Standard (baseline), and Flex (50% discount with unpredictable latency). Choose per request. Build routing logic. Amazon Nova 2 Lite? $0.30/$2.50 per million input/output tokens. Nova Act agent workflows bill at $4.75 per agent hour of real-world elapsed time. Multiple agents running in parallel each generate separate invoices. Nova Forge for model customization runs $100,000 annually, plus compute.

OpenAI built the token complexity tower- GPT-4o at $1.25/$5.00 per million tokens, o1-pro at $75/$300, and o1 at $15/$60. Audio tokens? $40/$80 per million. Web search tool? $10 per 1,000 calls plus 8,000 search content tokens billed at input rates. Fine-tuning? $25 per million training tokens. Rate limits cascade by tier- Free (3 RPM), Pay-As-You-Go (60 RPM), Enterprise (negotiate).

Anthropic did something revealing in November 2025- they released Claude Opus 4.5 at $5/$25 per million tokens, a 67% price cut from Opus 4.1's $15/$75. For months, mid-tier Claude Sonnet 4.5 had outperformed the premium Opus 4.1, pricing the flagship into irrelevance. The price slash created a pricing paradox where nobody could confidently say which model to route. Sonnet 4 costs $3/$15, Haiku runs $0.80/$4.

Google Gemini weaponized prompt-length tiering- Gemini 2.5 Pro charges $1.25/$3.75 per million tokens for prompts ≤200K characters, then jumps to $2.50/$10 for >200K. Gemini 2.0 Flash at $0.10/$0.40. Context caching adds $0.025-$4.50 per million tokens plus hourly storage. Grounding with Google Search? 1,500 free, then $35 per 1,000. Image output tokens? $30 per million.

The math of the mystery- AIMultiple research shows an identical 100K-input, 100K-output task costs $1.125 on GPT-4o, $1.80 on Claude Sonnet 4, $4.20 on DeepSeek, or $108 on a premium model, a 96x spread on identical work. When you deploy an agentic workflow with runtime model selection, you're managing tier decisions (Priority/Flex), model choices (Opus/Sonnet/Haiku/GPT-4o/o1/Pro/Flash), prompt caching eligibility, context window overages, per-call tool fees, audio/video premiums, grounding charges, and parallel agent-hour billing, all with region-specific, model-specific pricing that shifts quarterly.

What about AI - the new payroll?

McKinsey's latest MGI work frames the future of work as a partnership between people, AI agents, and robots, with today's tech theoretically able to automate ~57% of US work hours. Demand for "AI fluency" in job postings has grown ~7× in two years, and in a midpoint scenario these shifts could unlock ~$2.9T in annual US economic value by 2030-if organizations redesign workflows around human-AI collaboration.[1]

Let's be honest-AI isn't just "helping" with your job; it is becoming a parallel workforce that will eventually touch every use case worth automating. Every process, product, and playbook that can be instrumented will be rebuilt with agents in the loop-and teams will be expected to design, orchestrate, and govern them the way previous generations managed people.

Humans are gloriously constrained. We get 24 hours a day, and a lot of that is spent sleeping, eating, exercising, talking to our kids, doomscrolling, or binge‑watching something we’ll forget in a week. Our capacity is finite and legible. Agents are the opposite. Your new “colleagues,” the ones living just under your keyboard, don’t have circadian rhythms, don’t take PTO, don’t commute, and don’t have a concept of “after hours.” Give them power and endpoints, and they will happily run your backlog all night.

In a fully agentic economy, this becomes the new payroll: instead of headcount, you have an effectively infinite pool of machine labor, priced not by salary bands but by tokens, agent‑hours, tool calls, and context windows. Some of that labor will be cheap, some will be premium, and all of it will be metered. Your P&L won't just show "engineering" and "G&A"; it will show a long tail of AI labor lines, each tied to specific workflows, markets, and customers. The companies that win won't be the ones that use the most agents-they'll be the ones that know, precisely, what those agents cost per unit of value delivered.

Agents are also unnervingly good at billable hours. They expand to fill whatever space you give them-more experiments, more variants, more routes, more tools, more environments. When you can spin up a thousand specialized co‑workers in seconds, the limiting factor is no longer capacity; it's economic clarity. In a world where the AI workforce is the new payroll, understanding the true cost of your agent workforce isn't a nice‑to‑have, it's the difference between compounding leverage and compounding surprise.

Remove the mystery before it removes your margin

Today’s AI stack looks exactly like late‑stage telecom and early‑cloud billing- deliberately confusing, always changing, and designed so nobody inside your company can answer the simplest question with a straight face- “What did this decision actually cost?” AI is becoming the new payroll, but unlike humans, your agents don’t sleep, don’t cap out at 40 hours, and don’t file expense reports. They just spin the meter. Where there’s mystery, there’s margin. And right now that margin is not yours.

Revenium.ai exists to flip that script. We connect directly to your AI providers, infra, and workloads to turn token spaghetti, per‑agent hour billing, and tier gymnastics into clear, unit‑level economics- cost per call, cost per workflow, cost per feature, cost per customer segment. Instead of praying you’re on the “right” model or tier, you see, in real time, what each route, cache, and configuration does to both cost and performance. No more “I think this is cheaper.” You get “this variant is 42% cheaper at the same win rate; ship that.”

We built Revenium with a brutally simple north star: we want your team to see AI cost the way you see headcount cost. That means- understandable, forecastable, explainable to the board. We are not another dashboard for averages; we’re the meter on the agentic workflow itself, so you can design, route, and govern AI the way a CFO wishes cloud had worked from day one.

Where there’s mystery, there’s margin.

Revenium exists so the margin flows back to the builders, not the black boxes.

Ship With Confidence
Sign Up
Ship With Confidence

Real-time AI cost metrics in your CI/CD and dashboards

Catch issues before deploy, stay on budget, and never get blindsided by after-the-fact spreadsheets.