Your Agent's Budget Just Grew a Brain. OpenClaw and Hermes Meet Revenium Cost Controls.

18 Jun 2026
John D'Emic
[
CTO, Co-founder
]
Share
Your Agent's Budget Just Grew a Brain. OpenClaw and Hermes Meet Revenium Cost Controls.

When we shipped the first OpenClaw and Hermes skills, we made one promise. Your agent gets a budget. That promise was true, but also slightly incomplete. A budget was one number, one period, one threshold. It fired once, told you that you had reached it, and then went quiet.

Good for catching the worst case, but blind to everything else. Blind to which model burned the cash, which kind of work was running, which tool was looping, and whether you wanted a heads up before the wall or a hard stop at it.

Budgets in the real world are not single numbers. They're policy.

Revenium shipped Cost Controls to express that policy, and we rebuilt both skills on top of it. Same one command install. Dramatically more control. And two genuinely different philosophies of how to enforce a rule, one per agent, because a 90 second interactive session and a 6 week autonomous agent deserve different machinery.

What Cost Controls actually are

A Cost Control is a rule evaluated against live usage before the AI provider call. Not a post hoc report. Not a billing dashboard.

Every rule has five parts.

The metric is what gets counted: total cost, token count, error count, tool used, job performed, or a variety of other dimensions.

The window is a rolling, calendar aligned UTC period, daily, weekly, monthly, or quarterly.

The hard limit is the threshold that triggers enforcement.

The action is what the SDK does when the limit is hit, which today means Block, raising a cost limit exception before the provider call.

Finally, scope, the optional filters and group by dimensions that let you slice a rule per model, per agent, per task type.

Two more levers matter for the skills. The warning threshold is an optional soft band below the hard limit. It notifies without blocking. Then there is shadow mode, where rules are still evaluated and any breach is recorded in the log as a "would block" event, without actually stopping the agent.

The Enforcement Events log records one immutable entry per evaluation, either an actual block or a shadow “would block,” including the timestamp, rule, metric values, and a masked key hint. Notification Providers are configured under Settings then Integrations and fire when rules trip, through Slack, webhooks, and other channels.

From one threshold to a two band rule

Budgets used to be a single threshold. You hit it or you did not. There was no middle.

Both skills now create guardrails-native budget rules with a warn band and a hard limit block band. A background cron polls the rule's enforcement state every minute and writes a local guardrail-status.json. The agent reads that file, not the API, so there is zero per turn network latency. We kept the same principle: observe and report, never become the bottleneck.

The status file resolves to three states. OK, proceed silently. Warn band, surface the breached rule and show current value against the hard limit. Block band, stop. A trimmed real status looks like this.

{
 "halted": true,
 "warned": true,
 "warnedRules": [
   {"name": "weekly-debugging-cap", "currentValue": 42.18, "hardLimit": 50.00}
 ],
 "haltedRule": {
   "name": "monthly-org-cap",
   "metricType": "COST",
   "windowType": "MONTHLY",
   "currentValue": 1003.12,
   "hardLimit": 1000.00
 }
}

The warn band is where a human can still course correct. The block band is where the agent stops doing anything that costs money. The two states are independent, so an agent can be inside a warn band on one rule and still in OK on another.

Block before the call, two ways to enforce it

Cost Controls' Block action means stop before the provider call. The skills realize that intent differently, and the contrast matters.

OpenClaw uses procedural enforcement inside the agent loop. A mandatory guardrail check runs before every turn and every tool call. The skill's instructions are injected into AGENTS.md so they are always in context, which forces the agent to read status first. In interactive mode the agent surfaces the breach and asks for permission to continue.

A human stays in the loop. In autonomous mode the agent's entire response becomes the halt message. No tool calls. No further work. Until clear-halt.sh runs, nothing moves.

Newer hardening adds a typed before_agent_finalize plugin that structurally forces the agent to record what it did before finishing a turn, which feeds the attribution work in the next section.

Hermes uses structural enforcement through shell hooks. Three registered hooks carry the load. pre_llm_call injects the halt directive. pre_tool_call blocks every tool call on a block band rule. post_tool_call meters the result.

This path is deterministic regardless of session length, which is the difference that matters for an agent running for weeks. The SKILL.md halt block is still there, now as defense in depth rather than the primary gate.

Same rule. Fit for purpose enforcement. A short lived interactive session can afford a procedural ask. A persistent autonomous agent needs a structural stop that cannot be forgotten 40,000 tokens into a session.

Both skills are fail open. A missing or unreadable status file means proceed with caution, never hard fail the agent. We will not be the reason your agent dies in the night.

Controls are only as smart as the data feeding them

Cost Controls can scope and group by across any dimensions being metered. That’s why the rebuilt skills capture richer, more granular signals: you get slicing, attribution, and insights that actually mean something.

Task type lands on every completion. A label controlled vocabulary covers research, analysis, generation, review, code review, refactor, planning, and debugging. Cost Controls can now express a rule like alert when debugging spend exceeds 50 dollars this week, a per task type rule, not just a session total.

The two skills classify differently. OpenClaw classifies through the agent, now structurally forced by the finalize gate plugin. Hermes classifies deterministically in a revenium-classifier plugin at on_session_end, reading session data directly so attribution does not depend on a model remembering to label itself. That deterministic classification is the right answer for autonomous runs.

Agentic jobs make goal arcs visible. Discrete jobs are created and tracked via the Revenium CLI: revenium jobs create to define a job, link completions with --agentic-job-id, and then record the final result with revenium jobs outcome (SUCCESS, FAILED, or CANCELLED), immutably and once only. Spend and outcome are now visible per job, not just per session. You can look at a finished job and know what it cost and whether it worked.

Tool events get metered. Every tool call is captured, with name, duration, and success or failure. External tool spend lands in the same ledger as token spend.

Root session rollup fixes subagent fan out. Subagent (”squad”) spend rolls up under the originating root session and job. A fan out of subagents does not fragment attribution any more. That was the blind spot the Hermes post named, and it is closed.

Even enforcement itself is metered. Halt, warn, and shadow onset events are written as GUARDRAIL transactions, so the Enforcement Events log has a spend attributed counterpart. The system can see itself.

Granular metric plus scope equals budgets you can target at what the agent was doing, not just that it was doing something.

Shadow mode, and the gotcha that hides inside it

Turning on a hard block in a production agent is scary. You do not know if your limit is calibrated. Set it too low and you halt healthy work. Set it too high and you discover the wall by hitting it.

Shadow mode is the safe rollout path. The rule meters and logs would block events to the Enforcement Events log but never actually stops the agent. Run it for a window, read the events, calibrate the limit, then flip to enforcing. Both skills support it.

Notifications and the autonomous halt

When a rule trips in autonomous mode, the skill fires a one shot notification through your configured channel. Slack, Discord, Telegram, and similar for OpenClaw. Hermes messaging channels for Hermes. The notification carries the breached rule and the latest enforcement event.

The 3 a.m. logic from the Hermes post still applies. You find out the moment the agent stops, well before an invoice tells you. Resume is one command, clear-halt.sh. The human decides when to re open the tap.

Where this leaves you

The v1 skills gave each agent a budget. The v2 skills give each agent a policy. Warn band and block band. Scoped and grouped rules. Shadow mode rollout. Full spectrum metering across completions, jobs, tool calls, subagents, and enforcement itself. Procedural enforcement for OpenClaw, structural enforcement for Hermes.

Visibility precedes control. Control should observe and report, never become the bottleneck. Humans keep the final say.

Your agents were powerful, then accountable. Now they are governed. On your terms, by your rules, on one ledger, in real time.

Upgrade the skill with the same one line install. Read the Cost Controls docs at https://docs.revenium.io/track-and-control-costs/cost-controls. Free tier, open source, Discord open.

Table of Contents
Ship With Confidence
Sign Up
Ship With Confidence

Start with visibility. Scale with control.

50,000 transactions free. No credit card required.