AI tokens serve as the basic units of text that large language models process and generate to manage compute usage [1, 2, 3].

These units determine how much a company pays for AI services and how much information a model can remember at once. As generative AI usage grows, tokens have shifted from technical markers to central drivers of corporate budgeting and performance decisions [4, 5].

Tokens are often sub-word fragments rather than whole words [1, 2, 3]. This means a single word may be split into multiple tokens depending on the model's design. Developers and vendors use these counts to measure the workload on hardware, such as Nvidia chips, and to set pricing tiers [5].

Billing methods vary by region and provider. Some Chinese telecom operators are treating AI usage similarly to mobile data by charging users per million tokens [6]. In the U.S., companies like Anthropic have shifted away from flat-rate pricing toward models that more realistically reflect token consumption [7].

Industry leaders view the monetization of these units as a significant revenue stream. "This is easy money," Meta CTO Andrew Bosworth said [8].

However, some critics argue that the focus on token volume can lead to inefficiency. A FastCompany author said, "The AI industry has a quiet addiction problem: it is addicted to tokens" [9]. This focus can encourage "tokenmaxxing," where employees are incentivized to waste costly AI resources to meet specific goals [10].

While some technical perspectives suggest tokens are primarily for model processing [1], the commercial reality shows they are increasingly used as a billing metric for global consumers [6].

Tokens are the basic units of text that AI language models process and generate

The transition of tokens from a backend technical necessity to a frontend billing unit mirrors the early days of data roaming. By commodifying the smallest unit of processing, AI vendors can scale revenue precisely with compute costs, but this may create a misalignment where users are charged for the model's inefficiency in tokenizing text rather than the actual value of the output.