Table of Contents
Quick Answer
A token is the chunk of text — usually a word, part of a word, or punctuation — that an LLM processes in one step. Billing, context limits, and speed are all measured in tokens.
- 1 token ~ 4 English characters
- 1 token ~ 0.75 English words
- 100 tokens ~ 75 words ~ 1 short paragraph
What Does Token Mean?
Before a model sees text, a tokenizer splits it into numeric IDs the network can understand. Different models use different tokenizers — OpenAI's GPT-4 uses tiktoken (cl100k_base), Anthropic uses a variant of BPE, Google uses SentencePiece (OpenAI tiktoken docs, 2024).
The word "hamburger" may be 1 token in one tokenizer and 3 tokens ("ham", "bur", "ger") in another. Emojis, Chinese characters, and code symbols often cost more tokens than plain English.
How It Works
Tokenization uses byte pair encoding (BPE) or similar. The algorithm scans a huge text corpus and merges the most common character pairs into single tokens. Frequent words like "the" become one token. Rare words split into subword pieces.
The model then maps each token ID to an embedding vector and processes them all in parallel through attention layers.
Examples
- "Hello world" = 2 tokens
- "misarblog.com" = 5 tokens (URLs fragment badly)
- "antidisestablishmentarianism" = 6 tokens
- "你好" (Chinese "hello") = 2 tokens, but 6 bytes
- "
python print(1)" = ~8 tokens (code is token-heavy)
Tokens vs Words vs Characters
| Unit | Average size |
|---|---|
| Character | 1 |
| Token | ~4 characters |
| Word | ~1.3 tokens |
Languages without spaces (Chinese, Japanese, Thai) and non-Latin scripts (Arabic, Hindi) often cost 2-3x more tokens per character, which is a fairness concern flagged by Stanford HAI (2023).
When to Use This Concept
- Budget planning: multiply input + output tokens by price per 1M tokens
- Context limits: fit prompt + history + expected answer under the model's max tokens
- Latency: more tokens = slower response
- Prompt trimming: shorter prompts save money and reduce latency
Conclusion
Tokens are the currency of LLMs. Understanding them is the difference between a $10 bill and a $1000 bill. Read more primers on Misar Blog.
