Forget who has the smartest model. I think the Agentic AI platform war must ultimately become an infrastructure benchmarking problem, not just a model quality problem. Businesses are really looking for who can prove the most efficient execution fabric for orchestrating work at scale. Based on what I saw this week at ServiceNow’s flagship event, Knowledge 26, we’re a long way towards that goal.
For context, consider the mainframe era. The brands that dominated did not do so because people initially loved or understood mainframes. Or for that matter could easily differentiate. They won because enterprises needed confidence that mission-critical transactional workloads could execute reliably, repeatedly, and economically. AI is yet to arrive at that moment.
To achieve that confidence, a key measure of differentiation for mainframes was MIPS. It was the measure of computational throughput for the transactional era that helped businesses see that the technology coupled with their implementation model of it, was competitive. There is not yet to be an equivalent for the agentic era.
But I think TAPS (Tokens Attributed Per Second) could be that measure. And unlike raw token counts and costs which are understood, let’s call them TRU (token resource units), TAPS would allow the measurement of orchestration, flow, sequencing, exceptions, and execution efficiency across a business process.
Read on if Servicenow’s hockey stick moment is of interest.
I just spent another week in the US at another amazing AI event. And while it was compelling, from all the conversations I had with both execs and regular customer delegates on the conference floor, it was also more evidence that there is still something strangely incomplete about the current AI market.
Every week we hear another declaration that one model is now marginally better than another. No better example than while I was at Knowledge, Anthropic separately announced “dreaming” as a new feature. And then there are the ever more agent announcements, and more autonomy opportunitues, more OOTB workflows, and more partnerships. It’s like wading through mud.
The market is behaving as though intelligence and scale and feature adoption alone will determine success and adoption. But it won’t. Enterprises do not really buy intelligence. They buy execution.
That distinction matters more than most sellers currently seem to realise. Because buried underneath the noise of the current AI cycle is a much older pattern that has played out across every major transition in technology history.
The early phase of every infrastructure market is dominated by excitement, expansion and marketing narratives as vendors chase adoption velocity. Investors on the other hand chase enormous growth curves. Buyers are left with possibility, through conversations dominated by potential rather than operational reality. That is exactly where AI sits today.
That matters because pre-IPO and hyper-growth environments reward very different things than mature infrastructure markets do. ServiceNow CEO Bill McDermott calls this the AI blind spot.
Right now the market is rewarding model capability, developer adoption, ecosystem gravity, consumer awareness, token growth, and perceived inevitability. Operational efficiency is discussed but still secondary though it will ultimately matter more once the dust from the big $1T IPO bubbles hopefully settles this year.
In many ways this mirrors the early days of the mainframe era. Mainframes were not initially sold because enterprises deeply understood computational theory. Years ago I worked with a guy who escorted ANZ Bank executives on a multi-month delegation through the United States ahead of their very first major mainframe purchase.
The stories were wild. Endless vendor briefings, demonstrations, dinners, dude ranches, and political theatre. But underneath it all was a very serious question. Could this new technology platform reliably run the bank’s critical transactional workloads repeatedly, safely and at scale? The trip was ultimately successful not because the executives suddenly became computer scientists, but because they gained confidence that the operational heart of the bank could execute on this new infrastructure.
What followed was transformational. Banks needed payment processing. Then airlines needed reservation systems. Then Governments needed records management and retailers needed inventory control. The mainframe became the operational backbone of the modern enterprise because it created confidence in execution. But not all mainframes were created equal. So eventually that confidence became competitively measurable. Through MIPS. Millions of Instructions Per Second.
I also worked with one of Australia’s pre-eminent benchmarking specialists who was still generating millions of dollars a year for analyst firms well into the early 2000s doing benchmarking projects for the country’s largest telcos, banks and state governments.
Anyone that has ever seen a benchmarking engagement up close knows MIPS were never perfect. There were endless arguments about weighting, workloads, utilisation patterns and what was really being measured. But that was never really the point.
What MIPS gave the market was a comparative language for discussing throughput, scale and operational capability. A way for organisations to compare not just machines, but the transactional confidence underpinning their organisations. Importantly for this story, enterprises were not really buying mainframes. They were buying confidence that the business itself could execute better. That same transition is now emerging with agentic AI.
Today the market still talks about AI primarily as though it is software, largely because it is abstracted through modern PaaS architectures and wrapped in applications, copilots and conversational interfaces. But true end-to-end agentic systems are operational infrastructure.
And like every major infrastructure shift before them, their value will ultimately be judged against the thing they are seeking to replace or outperform. That’s not humans. It’s decision chains and transactional processes and more.
The real promise of agentic infrastructure is not that it thinks better than people. It is that it executes work across complex organisational systems more efficiently, consistently and adaptively than the operational models we have built during the previous technology era.
What organisations actually care about is whether work flows more effectively through their value chain which, at some point through decomposition, standardisation and transformation, ultimately arrives at workflows. And workflows, including yours dear reader, are ugly things ripe for disruption.
They are not neat diagrams in strategy decks. They are living operational compromises accumulated over years or decades. They contain approvals, escalations, retries, integrations, policies, exceptions, governance controls (hopefully), human intervention, shadow processes, duplicated decisions, historical baggage and institutional chaos layered on top of one another through successive generations of technology and management thinking.
That’s how we are arriving at a consensus that the hard problem is not generating intelligence through models. It is actually orchestrating the work those models support. Which is why I increasingly suspect the AI market eventually needs its own MIPS moment.
It needs to be something capable of measuring how efficiently tokenised work moves through complex, multi-functional orchestrated operational environments. Something like TAPS. Tokens Attributed Per Second. Not as a metric for models, but as a benchmarking input for workflows executing across agentic platform environments.
Because the real challenge is not isolated prompts or single-agent interactions. It is the coordination and orchestration of work flowing across functions, systems, humans, policies, APIs, approvals, exceptions and decision layers in real operational environments. TAPS would attempt to measure how effectively tokens are consumed, routed, contextualised, validated, transformed and completed as work moves through these interconnected execution chains.
Not just “how many tokens were used”, but how efficiently the operational fabric itself converts tokenised reasoning into completed outcomes across the enterprise. That’s the unproven point of differentiation and why the current market conversation still feels incomplete. And part of the reason financial analysts are gutting software stock (the blind spot).
Right now the industry is obsessing over token pricing, and model scale, and inference costs. But enterprises run workflows which create entirely different economic pressures.
What happens when agentic systems begin running finance, procurement, HR, customer operation, compliance, field services and government processes at scale? Suddenly the important questions become which platforms minimise retries, and which orchestration layers, regardless of architectural partnerships, reduce token waste. Or as was evident at Knowledge 26 this week, the current red hot moment is which AI platforms govern complexity and risk most effectively? That is not a model intelligence conversation anymore. It is one purely focused an operational throughput.
I thought Paul Fipps, Servicenow’s President Global Customer Operations had a good message. CIOs don’t want ungoverned custom software running around the enterprise. But it doesn’t stop there. CEOs expect that as a baseline, but they are actually looking for something more.
In the transactional era, measurable packets of data moved through enterprise systems. In the agentic era, packets of tokens move through orchestrated workflows. And each of these workflows involves a complex model of multi-vendor agent approvals, retrievals, escalations, classification events, retries and contextual handoffs
All of it becomes part of a living token economy where attribution and not unit cost will unlock the difference between good and great agentic infrastructure. The enterprise itself will slowly transform into a token-routing environment. And once that happens, measurement and benchmarking becomes inevitable. Not because analysts like me invent new acronyms but because procurement eventually demands operational comparability.
Boards and CFOs will not tolerate fuzzy economics forever once these systems become embedded into critical operational processes. They’ll want to know if this new tech can execute work more efficiently than the alternatives. That is when the market changes. That is when the hockey stick happens for ServiceNow. When they don’t just orchestrate and govern better than anyone, but when they unequivocally show they minimise operational drag across tokenised workflows.
In other words, the winners may end up looking less like AI companies and more like operating system companies. And I think that is something that suits a company like ServiceNow. That’s operational reality and totally aligned with the base that propelled them to #1 in category.
So perhaps we simply are not there yet. But I think we are very close. Perhaps the market first needs to pass through this expansion-era phase where narrative dominance matters more than operational efficiency. Perhaps we need to get through the growth cycle, the valuation cycle and the platform land grab before the harder industrial questions (and answers) arrive. Because history suggests they always do.
Eventually every infrastructure market matures. And when it does, the conversation shifts from what can this tech do to how efficiently can this technology run the world?
That was the real story behind mainframes. And I increasingly suspect it will become the real story behind agentic infrastructure platforms too. MIPS helped sell the transactional age. TAPS, or whatever its equivalent may be, may yet define the agentic one.


