Home / Daily News Analysis / Why AI tokens will send your enterprise cloud bill sky-high again

Why AI tokens will send your enterprise cloud bill sky-high again

Jun 26, 2026 Twila Rosenbaum 41 views

AI usage is moving to token-based pricing, a model far more expensive than the previous flat-fee or subscription-based access. This transition is reshaping enterprise cloud economics, reminding many customers of the volatile early days of cloud pricing. Underneath the confusion, tokens are quietly standardizing how labs translate scarce GPU capacity into billable units, how enterprises measure AI usage, and how software vendors reprice their products.

Tokens: The atomic units of AI

In this new world, the token is the basic unit of AI work. Tokens serve multiple roles: they are the unit of output from hardware and data centers, how labs price their outputs and inputs, and the value unit that enterprises look to monetize. This abstraction is precisely why labs and hyperscalers like it. Instead of charging for GPU types, memory, and power directly, they expose a single unit — tokens per million — over a bewildering mix of architectures and deployment topologies. OpenAI, Anthropic, Google, and others now publish per-model rate cards with separate prices for input tokens and output tokens, usually quoted in dollars per million tokens.

An AI token is the smallest unit a word or phrase can be broken down into when being processed by a large language model (LLM). For English, one token is roughly four characters, or about three-quarters of a word, so 100 tokens equals about 75 words. The token hides enormous complexity, from model choice and quantization to how aggressively you use caching or agents. That complexity is exactly what FinOps teams are now being asked to decode.

The end of the all-you-can-eat token era

If 2023 through early 2025 was the era of cheap experiments, the last 18 months have been a rude awakening. Three distinct phases have emerged: the old days of AI before ChatGPT, the good old days when chatbots could write decent code, and the post-November 2025 world when major model releases took AI from pretty good to really good. In the good old days, the era of all-you-could-eat tokens and subscriptions, we went through a brief period of token maxing where everyone was excited about their token leaderboard. Today, token leaderboards are painfully obsolete because no one can afford to waste tokens.

Between June and November 2025, global token usage grew in a nice linear path. Then new models and agentic patterns landed. Context windows went from a few thousand or tens of thousands up to millions of tokens in a single conversation, and agentic patterns added loops, retries, and corrections. Companies had happily subsidized that behavior until they saw the bills. Some $200-a-month power users actually cost upwards of tens of thousands of dollars a month when running everything on the latest model. Those days and prices are done. Moving forward, companies will have to pay the real cost of AI tokens.

Scarcity keeps token prices from collapsing

If Moore's law and hyperscale competition were the only forces at work, you would expect token prices to keep falling. To some extent, they have. Since 2023, token prices have fallen dramatically. However, both the trend may be flattening. Token prices have been pretty flat since November 2025, linked directly to hardware and power constraints. We cannot get enough hardware or enough power; we are seeing backlogs, long commitment periods, and shortages. Intel's CEO expects no real relief in GPU and related component supply until 2028. Supply chain constraints and rising hardware prices mean the cost of new frontier models continues to grow.

The net result is a classic Jevons paradox: falling unit cost but exploding total spend. Even with falling token prices, spend is still rising, and some months spend has doubled. Global usage is estimated to rise from 6 quadrillion tokens today to 120 quadrillion forecasted tokens within about 3.5 years. Even if token prices drop further once supply loosens, they are unlikely to fall 24x as fast as volume grows.

FinOps discovers token economics

For the FinOps community, which cut its teeth on cloud right-sizing and reserved instances, token pricing is both familiar and completely alien. The familiar part is that it is usage-based, the invoices are big, and forecasting is hard. The alien part is that the unit is tied to language, not infrastructure, and it changes as fast as model releases, not as slowly as server depreciation schedules. AI does not just stretch the cloud playbook, it breaks it. Unlike CPUs, AI models have unique strengths and weaknesses, different cost profiles, and swapping out an LLM is not just a pricing decision but also a quality-of-output decision.

Enterprises are retooling to manage this. They are building internal AI FinOps frameworks with three pillars: spend visibility (what we consume, how, and where), economics (how efficiently you leverage AI with token-level metrics like input/output ratios, cached token ratios, and token-to-spend drift), and value (connecting AI spend to business outcomes with cost per use case and inference cost by revenue). Every token needs to earn its cost.

Tokenomics: beyond just counting tokens

If FinOps is about cost control and accountability, tokenomics is about the full lifecycle of tokens as an economic good. It covers production (taking energy and capital to create tokens), consumption (allocation, forecasting, and optimization), and value (how to monetize tokens, adjust pricing based on cost, and understand labor implications). Token pricing directly collides with SaaS business models. Companies are shifting Copilot toward more explicit usage-based charging, and developers who loved unlimited tokens are now angry because their implicit subsidy vanished.

Labs are also tightening screws in ways invisible at the token level. Some models silently drop users to a different model if they try to build a competing LLM, making naive cost-per-token metrics unreliable. Not all tokens are created equal. A token can cost two cents per million or 35 per million, and one might drive a lot of value while another does not. The point of embracing tokenomics is to harness the fact that the C-suite has already latched onto tokens as a mental model.

Business models are evolving from credits and opaque consumption to hybrid subscription plus usage to direct pass-through models. All are vulnerable to upstream shocks. Any change in the token factory, model routing, cache efficiency, or forecasting affects consumer pricing at the end, cascading into banks and everyone else. The Linux Foundation is spinning up a Tokenomics Foundation to give big consumers and suppliers a vendor-neutral place to hash out specifications.

The human side: AI haves versus have-nots

Beyond spreadsheets, token pricing shapes who gets to use powerful AI. There is a societal divide between those who can afford the AI and those who can't. Certain teams are deemed worthy of getting the latest model, while others are routed to cheaper models. Yet there is also a strong argument against crude caps. One Fortune 100 executive advised looking across usage to find outliers who might be doing something really interesting, rather than capping them. In a world where YC-backed startups receive millions of dollars of tokens to disrupt incumbents, shutting down internal experimentation could be an existential threat.

For individuals and new workers, token pricing feeds into broader anxieties about AI and jobs. The person who is better at AI is coming for the job of the person who is not using AI. If token prices and quotas restrict who can learn and experiment, that divide will only deepen. For both companies and individuals, we are moving quickly into an AI-token-based economy that will be far more expensive than it has been.

Source:ZDNET News

Why AI tokens will send your enterprise cloud bill sky-high again

Tokens: The atomic units of AI

The end of the all-you-can-eat token era

Scarcity keeps token prices from collapsing

FinOps discovers token economics

Tokenomics: beyond just counting tokens

The human side: AI haves versus have-nots

I've used the iOS 27 beta for a month: 7 ways the new Siri is dramatically better

How to remove AI Overviews from Google Search: 4 easy ways

Google Search will let you instantly generate AI images for free - here's how

Google is training AI on even more of your data now, unless you opt out - here's how

How to be visible on ChatGPT, Claude, and other AI search tools

Wimbledon

Météo (bulletin du 16 07 2026)