Why Your AI Products Don’t Know Their Margins

07 Apr 2026

John Rowell

[

CEO, Co-founder

]

Why Your AI Products Don’t Know Their Margins

The $1,200 Bill That Was Really $9,000

A fintech startup built an AI-powered loan pre-approval tool and ran the numbers in late 2024. Their LLM bill was $1,200 a month, which felt manageable for the volume they were doing. Finance signed off. The team moved on.

Someone finally added everything else up.

Each loan inquiry did more than hit the model. The agent pulled a live FICO score through a credit bureau API, checked the CRM for returning customers, ran income verification through a banking data provider, triggered fraud screening, and generated a document. By the time a single transaction finished, the real cost was $1.43. The LLM line item still showed $0.02.

The model bill was $1,200 a month. The actual AI infrastructure cost was nearly $9,000. That difference lived entirely in tool calls scattered across vendor systems that no one had connected to the workflow generating them.

That gap has a name, The Iceberg Effect.

The Iceberg Effect is the gap between what your model provider bills you and what your AI product actually costs to run.

The LLM call is the visible tip. Downstream agent actions (tool calls, API lookups, data fetches, service invocations) are the mass below the waterline, and that mass is what determines your margin.

The Iceberg Effect: the LLM bill is what you can see at the top— tool calls and downstream services are the real cost.

This Is a Margin Visibility Problem

Most conversations about AI costs begin and end with the model bill. Teams track token usage, compare provider pricing, shave prompt length. That is cost management.

Understanding your economics is a different problem.

In traditional SaaS, infrastructure costs scaled in reasonably predictable ways. You could estimate cost of goods sold with confidence, set pricing, and expect the math to hold. AI products do not work that way. Every agent action carries variable cost. Every workflow has a different cost profile depending on which tools it touches, and a single customer running a complex use case can cost ten times what a typical user costs. You usually learn this months later, when bills from five different vendors land with no common thread.

To understand the real economics of an AI product, three numbers have to come together: what each customer pays, what their usage actually costs across the full transaction stack, and the attribution that connects those costs to specific features, workflows, and agents. Most teams have the first number. Almost none have the second. And without both, the margin calculation is fiction.

Three Industries. Three Icebergs.

1. Fintech: AI Loan Pre-Approval Agent

A user submits a loan inquiry. The LLM generates a pre-approval summary in two seconds. Here is what actually ran behind that single prompt:

The LLM cost was $0.02. The transaction cost was $1.45, a 72x gap. At 10,000 monthly inquiries, the team was looking at $14,500 per month in real AI cost against a pricing model built on a $200 LLM estimate.

2. Healthcare: AI Prior Authorization Agent

A health tech company automates insurance prior authorization, one of the most manual and expensive workflows in healthcare administration. Each submission triggers the following:

The team priced their per-submission fee at $0.50, based entirely on LLM cost estimates. They were losing money on every transaction and did not know it for six months.

3. B2B SaaS: AI Sales Intelligence Agent

A revenue team deploys an AI agent to research prospects and generate personalized outreach. Every contact enrichment run triggers the following:

At 5,000 contacts per month, that is $3,300 in real cost against an LLM bill showing $150. The sales team thought they were running a lean operation. The unit economics said something different.

Why Nobody Sees It Coming

The Iceberg Effect is not a failure of intelligence. It is a failure of tooling, and the blind spots show up differently depending on where you sit in the organization.

Engineering sees token usage and latency from the model provider. They can report P99 response time and cost per thousand tokens. What they do not see are the third-party API calls the agent triggered downstream, because those costs land in entirely separate billing systems: credit bureau invoices, SaaS platform overages, cloud storage line items, none of which are connected to the AI workflow that generated them.

Finance sees the monthly LLM invoice and can report what the organization spent on GPT-4o last quarter. What they cannot reliably say is how much of the credit bureau spend, the CRM overage, or the data enrichment bill was driven by agent activity versus everything else. The costs exist, but they are scattered across a dozen vendor dashboards with no way to attribute them.

Product sees the feature working. Adoption climbs and users are satisfied. But there is no way to calculate true cost-to-serve for a given customer segment, and no signal for whether the feature launched last month is margin-positive or quietly running at a loss on every session.

No single team holds a complete economic picture of an AI transaction. Without that picture, pricing is guesswork, cost allocation is approximation, and margin erosion goes undetected until it becomes a quarterly problem.

Agents Make the Iceberg Bigger

Early AI products were relatively self-contained. A user submitted a query, the model returned a response, and the cost surface was narrow and consistent.

Agentic AI is built to behave differently. Agents use tools, call services, retrieve data, and coordinate workflows across systems. That capability is precisely what makes them worth building. It also makes the Iceberg Effect compound in ways that are hard to track. An agent running a complex research task might hit a search API dozens of times. One handling a financial workflow might call four or five third-party data services per transaction. A multi-agent system, where one agent spawns sub-agents to handle parallel workstreams, can generate hundreds of external calls from a single user action.

Engineers who built these systems understand this at a technical level. That understanding rarely surfaces in a spreadsheet where someone needs to calculate margin per customer or cost per feature. The more capable your agents become, the wider the gap between what the LLM provider bills you and what it actually costs to serve each customer.

Three Capabilities That Close the Gap

Discovering the Iceberg Effect in a quarterly cost review is too late. By that point, a workflow has already run up five figures without anyone catching it, or a feature has been underpriced for six months. The teams that stay ahead tend to share three capabilities:

1. Full Transaction Capture

Every action an agent takes needs to be captured and attributed back to the customer, feature, and workflow that triggered it. Every model call, every tool invocation, every external API hit. Not sampled, not aggregated after the fact. Every event, as it happens. Without this foundation, any downstream cost analysis is built on incomplete information, and incomplete information produces wrong answers at the moments that matter most.

2. Cost Attribution, Not Just Cost Aggregation

Total AI spend for the month tells you the size of the problem, not its shape. The questions that actually drive decisions are more specific: which customer segment is driving margin erosion? Which agent workflow is running at a 300% cost overrun against your pricing assumptions? Which product feature is profitable at scale, and which one subsidizes every session it runs? Attribution at the transaction level is what turns a cost number into something you can act on.

3. Economic Guardrails

Visibility without the ability to intervene is just a clearer view of a problem you cannot stop. The most effective AI teams set per-transaction cost thresholds, per-customer spend limits, and automated stops that prevent runaway agent usage before it reaches the finance team. If a workflow exceeds its expected cost envelope, the system flags it or halts it before the damage accumulates.

The Next Phase of AI Infrastructure

The first wave of AI infrastructure was about building models: training, fine-tuning, and serving them at scale. The second wave was about deploying AI inside products, with orchestration frameworks, agent toolkits, and retrieval pipelines. Both waves created real value. Neither answered the question that determines whether a product is viable: does this AI feature actually make money?

As AI systems become more agentic and more deeply integrated with external tools and services, their financial behavior becomes harder to observe and harder to control. Every new capability an agent gains expands the iceberg. The gap between what your LLM provider reports and what your AI operations actually cost will keep widening as long as those two things are measured separately.

This points to the need for a new infrastructure layer, one that connects every AI action to its economic impact. Not model observability. Not application monitoring. Economic observability: the ability to see the full cost, margin, and unit economics of every AI transaction your product runs.

Seeing the Whole Iceberg

That is the infrastructure Revenium provides.

We capture every AI transaction end to end, from the first model call through every downstream tool invocation, third-party API hit, and infrastructure event, and trace all of it back to the customer, feature, and agent that triggered it. Not sampled data, not aggregated logs. A complete economic record of every action your product takes.

For finance teams, that means real cost attribution: chargeback models, accurate budgeting, and unit economics that do not depend on estimates built from a single vendor invoice.

For product teams, it means knowing the true COGS on every feature before pricing is set, not after months of margin erosion have already worked their way into the run rate.

For engineering teams, it means guardrails: spending limits, cost alerts, and automated stops that surface runaway agents before they become a finance problem.

The iceberg has always been there. Most teams have been navigating without any way to see it.

The Iceberg Effect is real. It does not have to stay invisible.

See the full cost of every AI transaction. Connect your providers and get attribution in minutes.

revenium.ai → Request a Demo

Note:

Cost estimates in the examples above are illustrative and based on publicly available API pricing at time of publication. Actual costs vary by provider, volume tier, and contract terms. The multiplier ranges cited are directional. Validate against your own transaction data for precision.

Table of Contents

What Is FinOps for AI?

Ship With Confidence