AI FinOps Isn’t a Black Box
Unpacking Cost, Speed, and Our Own Accountability
We talk about AI the way people once talked about mainframes. Like it’s a kind of magic. Complex. Hidden. Best left to experts. That was true for a while, until it wasn’t. Someone eventually gave it structure. MIPS. FLOPS1. Numbers that let people compare. Numbers that made performance measurable, cost understandable, strategy defensible. A framework that challenged assumptions and gave executives a way to think about machines without mysticism.
We’re in that moment again.
But this time the fog isn’t just about technology. It’s about trust. In my earlier piece on Breaking the AI Confidence Recession, I argued that the AI challenge isn’t capability. It’s actually conviction. Too many teams are stuck. Not because they lack tools but because they lack a framework for action. AI has made this worse. Not because its potential is unclear, but because its economics are. You can’t measure ROI without knowing what you’re spending. And with AI, even that basic understanding feels out of reach.
No one can quite agree on what the cost of intelligence actually is. Or who owns it once it’s embedded across the business. Or how to hold it accountable when its value disperses across functions. This isn’t a capability gap. It’s a visibility gap. And it’s eroding the confidence that transformation depends on.
Here is a truth from which we can all embark. AI introduces a new kind of operating cost. One that’s linguistic, probabilistic, and dynamic.
Traditional FinOps disciplines were built for infrastructure like compute, storage, and bandwidth. They monitor systems that execute instructions. AI, by contrast, performs cognition. Every model interaction consumes tokens. Every inference represents a micro-decision made on your behalf. The problem is, we don’t yet have a shared way to describe that consumption.
We talk about “running” a model but the units of AI economics are buried under licensing abstractions and pricing bundles. Few leaders could say what that run actually costs, per transaction or per outcome.
This is the next frontier of FinOps. Not a financial exercise, but a leadership one. Because when you adopt AI, you’re no longer just managing infrastructure. You’re managing digital thought. Each model has a runtime signature. A distinct pattern of how it processes context, handles ambiguity, and delivers output. Understanding that signature, and what it costs to maintain, is the new responsibility of executive stewardship.
Part of the answer clicked into place at the Google Cloud Summit in Sydney earlier this year. ANZ Vice President Peter Miglorini flashed a slide showing the exponential growth of AI tokens. Tokens. Not seats, not licenses, not users.
Tokens are fast becoming the new unit of enterprise work. Yet most organisations have no framework for what that means. No shared understanding of how to measure, govern, or plan for the economic implications of tokenised intelligence. That’s the real gap in the conversation. Not how capable AI is, but how accountable it can be once cost becomes measurable.
I am not saying the goal is to audit every token. It is to reclaim confidence by making something hidden visible enough to manage. Too many teams are trying to synthesise before they’ve analysed. This just layers complexity over uncertainty.
AI’s economics will never be as neat as infrastructure metering, and they don’t need to be. What matters is clarity. The ability to compare, not by vendor claim or marketing label, but by observable behaviour. By how models actually perform, and what it costs to make them perform that way.
To do that, we need a language that normalises the chaos. In the mainframe era, MIPS and FLOPS made sense of performance. In the AI era, we need equivalents that make sense of cognition.
One such idea is the Token Resource Unit, or TRU. It is a way of expressing how much work a model performs per fragment of language. It’s not exact, and it’s not meant to be. Precision is less important than comparability. What problem is this solving? TRUs would allow leaders to reason about AI consumption in relative terms and to compare different models, platforms, or configurations using a common scale of resource behaviour.
Alongside TRU sits another idea. Tokens Attributed Per Second, or TAPS. Where TRU helps you think about cost, TAPS helps you think about speed or how efficiently intelligence is being delivered per unit of resource.
Together, they hint at a language that ties cost, performance, and value into a single continuum. They won’t make AI economics precise. They don’t have to. What they offer is the ability to think about it coherently, to turn abstraction into something we can observe, question, and benchmark over time.
This shift also reframes how we think about value. AI cost doesn’t live in procurement or capital budgets. It lives at runtime in the moment a model does the work. That’s where operating expense meets business outcome. That’s where efficiency becomes measurable and ROI becomes tangible.
The question isn’t what you paid to deploy the system, but what you pay every time it runs. And whether that cost is justified by what it delivers.
The truth is, executives don’t need another billing dashboard. They need a way to reason. A structure to ask better questions. What’s our true cost to serve per agent task? Are we paying for cognitive complexity we don’t use? Should AI cost be charged back to IT, HR, or CX? How does our runtime footprint compare to others doing similar work? These questions don’t have fixed answers. But asking them is the beginning of accountability.
AI FinOps, in the end, is not about control. It’s about comprehension. It’s about building confidence through conversation, not compliance. It’s the ability to say, with some structure, that this is what our intelligence layer consumes, this is what it delivers, and this is where we choose to draw the line.
The AI wave won’t slow down. But the fog around cost and value will only lift if we help lift it. Vendors won’t do that for us. Regulators can’t. If AI is truly the next layer of enterprise infrastructure, then we need to own how it runs, and how much that running costs.
So, AI isn’t really a black box unless we let it be. The tools and metrics are still forming, and they won’t be perfect. But that’s okay. The act of looking inside and of reasoning through what AI consumes, not just what it produces is the real step forward. The rest will follow.
Author’s note:
Somewhere during a long airport layover, I started sketching a framework to make sense of AI cost and performance. What began as a thought exercise turned into a small browser-based benchmark. It was a way to model runtime behaviour across public LLMs using nothing more than the data vendors already publish. It wasn’t a product, just an exerise to prove that it’s possible to reason through AI costs with what’s already in the open. The fog only feels impenetrable until you start mapping it. What I’m saying is, it’s possible.
MIPS (Millions of Instructions Per Second) and FLOPS (Floating Point Operations Per Second) were early benchmarks for computing performance. One was for general instruction speed, the other for mathematical precision. They turned machine power into something measurable and comparable, helping to foster a global benchmarking industry that shaped decades of hardware innovation and procurement standards.



This article comes at the perfect time. The way you frame the 'visibility gap' is so insightful. What if we keep building AI without these economic metrics? We'll just have more black boxes, not a true progress. You nail why conviction matters.