What it means
An LLM does not process text as characters or words. It processes text as tokens, which are sub-word units produced by a tokeniser. Common English words are usually one token each ('the', 'cat', 'restaurant'). Rare words, names, or non-English text often use multiple tokens. Punctuation and whitespace count too.
A useful rule of thumb: 100 tokens equals roughly 75 English words. 1,000 tokens equals about 750 words or 1.5 single-spaced pages. Different LLMs use different tokenisers, so token counts differ slightly between models.
Why it matters
Almost every LLM cost and limit is measured in tokens, not words or characters. Pricing is per million tokens (input and output charged separately, often at different rates). Context windows are token caps. Rate limits are token-per-minute caps.
Practical implication: a long system prompt is not free. A 5,000-word system prompt is roughly 6,500 tokens of overhead on every reply, which adds up at scale. Minimising prompt length while preserving guardrail quality is a real cost lever.
Example
A high-volume e-commerce brand runs an AI agent that handles 30,000 customer conversations a month. Each conversation averages 3,000 input tokens (system prompt, retrieved knowledge, chat history) and 200 output tokens. Monthly token spend: 90 million input, 6 million output. At GPT-4o-mini's rates, around USD 18 a month. At GPT-4o's rates, around USD 250 a month. Same agent, vastly different bill, just by model selection.