From collective price reductions to collective price increases, why did the "Token Economics" experience a sudden shift in trend over two years?

Question

How does AI · Intelligent Agent Application Ignite a Surge in Token Demand?

Tokens are the “new currency” in the AI era. In 2024, the AI price war has begun, with tokens priced in “cents”; by 2026, demand for computing power will explode, and model vendors and cloud providers will collectively raise their token prices.

Over the past two years, the large model industry has experienced a dramatic shift from price wars to value wars, and the value of tokens is being reevaluated. Beyond wages, bonuses, and equity, tokens have even become a new bargaining chip in Silicon Valley engineers’ salary negotiations. The ecological layout and resource competition around tokens have already begun.

From Price Drop to Price Increase

By 2026, model vendors and cloud providers will collectively raise their token prices. This year, Zhipu has issued two price increase notices. On March 16, Zhipu launched the base model GLM-5-Turbo optimized for deep scenarios in OpenClaw, with API prices increased by 20%. In the “Lobster” packages for individual and enterprise users, Claw experience monthly cards are 39 yuan/month, including 35 million tokens; Claw advanced monthly cards are 99 yuan/month, including 100 million tokens. In February, Zhipu announced a price adjustment for the Coding Plan, stating, “Due to the sustained strong market demand for the GLM Coding Plan, with rapid growth in user scale and call volume,” deciding to cancel the first-time purchase discount, while retaining quarterly and annual subscription discounts, with overall package prices increasing by at least 30%.

Besides model vendors, cloud providers are also collectively raising prices. Due to the popularity of Coding Plan subscriptions, Alibaba Cloud’s model API calls surged, and on March 4, they announced a phased adjustment of first-time purchase discounts, with limited daily supply, while supplies last. On March 18, Alibaba Cloud stated that due to the global AI demand explosion and supply chain price increases, the costs of core hardware procurement in the industry have risen significantly, and from April 18, prices for AI computing power, CPFS (Intelligent Computing Edition), and other services will be adjusted. Services related to Pengtougexin Wu 810E and other computing cards increased by 5%-34%, and CPFS (Intelligent Computing Edition) increased by 30%.

Baidu Smart Cloud also announced that from April 18, AI computing power-related products and services will increase by about 5%-30%, and parallel file storage and other services will increase by about 30%. Tencent Cloud announced that from March 13, the public testing of models GLM 5, MiniMax 2.5, and Kimi 2.5 has ended, transitioning to official commercial services billed based on model calls. The prices of the Hun Yuan series models have also been adjusted: Tencent HY2.0 Instruct model input price increased from 0.0008 yuan/1,000 tokens to 0.004505 yuan/1,000 tokens, output price from 0.002 yuan/1,000 tokens to 0.01113 yuan/1,000 tokens.

However, just two years ago, the “price reduction wave” of tokens is still fresh in memory.

In the 2024 “Hundred Model Battle,” the large model industry was still in the midst of fierce price wars, with cloud and model vendors competing by lowering prices and giving away tokens.

In May of that year, ByteDance launched a price war with a price of 0.0008 yuan per 1,000 tokens, followed by Alibaba Cloud, which announced a maximum discount of 97% on Tongyi Qianwen. At that time, the main model Qwen-Long of Tongyi Qianwen, comparable to GPT-4 level, saw input prices drop from 0.02 yuan/1,000 tokens to 0.0005 yuan/1,000 tokens. Meanwhile, Zhipu’s newly registered user bonus increased from 5 million tokens to 25 million tokens.

DeepSeek, which trained high-performance large models at lower costs, revealed key information behind its V3/R1 inference system in March last year. By optimizing throughput and latency, if all tokens are priced according to DeepSeek-R1, the cost-profit ratio can reach 545%.

Technology is the confidence behind model price reductions. Tan Dai, President of Volcano Engine, ByteDance’s cloud service platform, stated during the 2024 AI price reduction wave that the basic logic of price cuts is confidence in reducing costs through technical means, and the market also needs lower-priced large models.

“Two years ago, demand for computing power was mostly from enterprises; now, individual demand is ‘hungry,’ driving AI startups and large companies to shift their business models toward token consumption,” said Tian Feng, President of the Fast and Slow Think Tank and former founding director of SenseTime’s AI industry research institute.

In the past two years, models have rapidly iterated, and intelligent agent applications have grown significantly, driving continuous increases in computing power demand. High-cost inference GPUs have limited capacity, and costs for core hardware like memory and related infrastructure have risen sharply. Bernard Golden, CEO of Navica, a Silicon Valley tech analysis, consulting, and investment firm, said the entire industry is frantically seeking more computing power.

Under the imbalance of supply and demand, price increases are inevitable.

“A smarter model performs more complex tasks and consumes enormous resources,” said Zhang Peng, CEO of Zhipu, when responding to price hikes. He explained that the reasoning and thinking chains behind agent tasks are longer, and they interact with underlying infrastructure through code writing, constantly debugging and correcting errors. The token volume needed to complete a task is ten or even a hundred times that of answering a simple question. The essence of price adjustment is changing costs—“bigger models, stronger capabilities, and higher service costs—so we want to gradually bring it back to a normal commercial value range. Relying on low prices long-term is not good for industry development.”

Token Call Volume Grows a Thousandfold in Two Years

Over the past two years, software vendors have integrated text, image, and speech generation capabilities into existing products such as customer service platforms, marketing material creation, and service robots through standardized API interfaces. Enterprise users call large model capabilities via APIs, billed by usage or subscription, lowering entry barriers and upfront investments. After all, a single H100 GPU costs about $25,000, and deploying multiple GPUs in one system costs even more.

This service model allows large models to reach vast numbers of users quickly, causing token call volumes to soar. Liu Lihong, Director of the National Data Bureau, recently disclosed that by the end of 2025, over 100k high-quality data sets had been built nationwide. By March this year, China’s daily token calls exceeded 140 trillion, a more than 1,000-fold increase from early 2024’s 100 billion, and a 40% increase over the 100 trillion at the end of 2025 within just three months.

Tian Feng told The Paper that in 2024, the demand for training compute power increased by over 50%, but by 2025, the situation reversed completely. If two years ago was the hundred “model” battle, now it is the hundred “shrimp” battle.

The explosive growth in reasoning demand, with reasoning services deeply tied to token consumption, is the largest and fastest-growing compute scenario. Continuous improvements in model performance drive token consumption skyrocketing, and widespread adoption of AI programming, “OpenClaw” (Lobster), and other intelligent agent applications cause token demand to explode. OpenClaw is jokingly called a “token black hole.” For companies and individuals using Lobster, tokens are the biggest cost bottleneck.

Tian Feng said that the token consumption of intelligent agents executing tasks is 4-15 times that of traditional Q&A. AI entrepreneur Luo Xuan used OpenClaw to complete complex research tasks, consuming millions or even more tokens. To find cheaper tokens, his experience is to register as a new user with cloud or model vendors to get free tokens, but he still laments, “tokens are too expensive.”

Programming, chatting, office work, and other compute tasks also consume tokens. From a broader perspective of compute consumption, image generation priced by image count and video generation priced by duration and resolution also consume massive compute resources. OpenAI shutting down the Sora video app is an example. Running video generation services requires huge computing power and electricity, which is a massive expense for any company, and shutting down Sora frees up substantial compute resources.

The demand for compute power not only drives GPU demand but also causes related hardware to fluctuate and become a limiting factor.

“Cooling, lighting, server electricity, and data center power costs account for about 60%. Now, energy prices for oil, natural gas, and other sources are rising, and memory has a five-year upward cycle,” said Tian Feng. Energy and hardware costs drive compute power price increases.

Huang Zhiming, Vice President of Cisco and CEO of Greater China, told The Paper that in the short term, hardware investments and factory construction cannot be completed in a month or two, and supply-demand fluctuations will continue for some time. Hou Shengli, Senior Vice President and CTO of Cisco Greater China, added that catching up with demand generally takes about two years; “memory factory adjustments take at least two years, and there won’t be improvement before the end of 2027. Rebuilding factories and laying out production lines isn’t that fast.” However, Huang Zhiming believes that as the user base expands and applications become more widespread, costs will gradually become more affordable and accessible.

Yao Xin, founder of Piao Cloud Computing (Shanghai), told The Paper that today, the bottleneck limiting AI and compute power is not the most advanced chips but the ordinary IT technologies and traditional supporting components. Over the past decade, the traditional IT infrastructure industry chain—memory, hard drives, switches—has maintained steady growth aligned with global GDP growth, with long-term stable demand driving moderate capacity expansion. But the explosive growth of AI has broken this balance. GPU shipments have surged, and supporting peripheral components are lagging behind in this “turning point” demand. “High-end chip capacity has increased, but other capacities haven’t kept pace. Everyone has been hit hard, so traditional components like memory and hard drives are expanding their production.”

Alternating Rise of Supply and Demand, Eventually Stabilizing

“Now tokens are more expensive than interns; in three to five years, they will definitely be cheaper,” Tian Feng also believes that future token prices will decline.

He thinks that, in the short term, the rise in compute power prices stems from supply-demand mismatches. But from a semiconductor cycle perspective, manufacturing has capacity cycles: after expansion, new capacity is released in a concentrated manner, market supply and demand are disrupted, and prices fall, even leading to overcapacity. Regarding energy, China is advancing its new energy transformation, which could further reduce energy costs. In the medium term, prices depend on the capabilities of foundational models—new versions iterated every three months often address unmet needs and release new demands, pushing up compute prices; in the long term, it depends on the evolution of reasoning capabilities, ultimately leading to a sustained decrease in compute costs.

Over the past two years, supply and demand have alternated in prominence. Tian Feng said that DeepSeek represents a peak in cost reduction through innovation, while the explosive productivity of “lobster” models creates a demand peak. “But this doesn’t mean that during demand surges, reasoning costs don’t decrease; it just means that the speed of demand growth exceeds the rate of reasoning cost decline. In 3-5 years, overall compute costs and token fees will drop sharply.”

Yao Xin said that AI has entered a “singularity moment,” “entering a period of tenfold or hundredfold rapid growth within the next one or two years. Industries unprepared for this growth will face shortages in the short term. But like ripples, it will gradually spread and eventually stabilize.”

Behind the rising token prices, the business logic is also changing. Nvidia CEO Jensen Huang has repeatedly mentioned the “five-layer cake” structure of AI: “The five layers are energy, chips, infrastructure, models, and applications, with the top layer providing the greatest economic dividends.”

“Current AI is like the internet in 2000—people didn’t really understand what the internet could do, but countless individuals invested in building various websites,” said Hou Shengli. “As applications and innovations continue, by 2005 or 2006, more ‘Internet+’ scenarios emerged, and various services gradually integrated.” The development of AI is similarly promising. As widely predicted, 2026 will become the year of intelligent agents, with a proliferation of intelligent agent applications this year.

These intelligent agents are now embedded in smartphones, computers, and even factory production lines. “Everyone’s demand for AI to boost productivity is almost endless; the only limit is price. When prices rise, demand drops; when prices fall, demand rises,” Tian Feng said. Even now, large companies do not treat price increases uniformly. “On one hand, they raise cloud computing prices for B-end (enterprise) clients; on the other hand, they offer limited-time free trials or token giveaways to C-end (consumer) markets.” Tian Feng admitted that current situations resemble the early days of the internet: while capturing users is the ultimate goal, the more critical battle is for developers.

In the past, developers were global programmers; now, many non-technical personnel possess Vibe Coding skills. They are both consumers and creators of code. When big companies lock in developers, they ensure that the results of development stay on their cloud.

Major internet companies are providing token quotas to employees to encourage AI use. According to Jiemian News, Alibaba is promoting an internal program that offers token quotas to employees, encouraging them to use advanced AI models and tools in their work. Employees can use paid AI tools like Wukong and the Qoder intelligent agent programming platform for free, with the company providing token quotas. Employees purchasing Balian Coding Plan memberships or external AI development tools can apply for reimbursement.

Use cases for AI efficiency are not limited to programming; broader content creation and professional office tasks also generate token demand. MiniMax has even upgraded its original Coding Plan to support MiniMax multimodal models with a Token Plan, seizing token opportunities.

“Frankly, there aren’t many urgent needs for model development, so most adopt a monthly subscription model. Tokens are gaining attention because metrics like monthly user growth and per-user token consumption directly reflect revenue growth,” Tian Feng said. This creates strong user stickiness: as long as the product is good enough, users are willing to pay a premium for a better experience. Moreover, the same 5 million tokens can be sold for 22 yuan or 400 yuan, with the premium directly linked to the base model and agent capabilities. Tian Feng believes that fundamentally, tokens are like an untapped gold mine.

From collective price reductions to collective price increases, why did the "Token Economics" experience a sudden shift in trend over two years?

Trending Topics

Gate13thAnniversaryLive

WCTCTradingChallengeShare8MUSDT

BitcoinBouncesBack

EthereumMemeSeasonReturns

USIranTalksProgress

Pin