Tokens.
Learn what Tokens means in modern search and SEO.
The basic units of text that AI language models process — roughly corresponding to word fragments — used to measure input and output length and compute API costs.
In the context of large language models, tokens are the discrete units into which text is broken during processing. Tokenisation algorithms (like OpenAI's tiktoken or Anthropic's tokeniser) split input text into tokens — roughly ¾ of a word on average in English, but variable. 'unhelpful' might be two tokens: 'un' and 'helpful'. Spaces, punctuation, and subword fragments are all tokens.
Why Tokens Matter
LLM API costs are priced per token (input tokens and output tokens separately). Context windows — the maximum amount of information a model can consider at once — are measured in tokens. Understanding tokenisation helps estimate costs, optimise prompt length, and stay within context limits for large-scale content operations.
Context Window and Long Documents
A 100K token context window can hold approximately 75,000 words — roughly a novel. Models with larger context windows can process entire codebases, long reports, or extended conversation histories in a single call. However, models tend to 'lose' information from the middle of very long contexts (the 'lost in the middle' problem), performing better on content at the beginning and end.
Tokens Across Languages
Tokenisation efficiency varies by language. English text is typically 1 token per ~0.75 words. Languages not well-represented in training data (many Asian languages, non-Latin scripts) often require more tokens per equivalent meaning — increasing API costs and reducing effective context window capacity for multilingual applications.
Ready to close the loop?
See every term in action
Aergos tracks your AI and organic visibility across every channel, in one platform.
Not ready to talk? Audit your site free →
