Table of Contents
Quick Answer
Temperature is a number (usually 0 to 2) that controls how random a large language model's next-word choice is. Lower = safe and repetitive; higher = creative and unpredictable.
- Range: typically 0.0 to 2.0 (default often 0.7 or 1.0)
- 0.0 = always picks the highest-probability word
- 2.0 = nearly random word choice, often gibberish
What Does Temperature Mean?
When an LLM generates text, it computes a probability for every possible next token. Temperature rescales those probabilities before sampling. At temperature 0, the model always picks the single most likely token — fully deterministic. As temperature rises, lower-probability tokens become more competitive, so the model is willing to pick less obvious words (OpenAI API docs, 2024).
Think of it as a "creativity dial." Zero = a strict grammar teacher. One = a thoughtful writer. Two = a caffeinated poet.
How It Works
Internally, the model outputs logits (raw scores) for each token. Temperature divides each logit before the softmax step:
- adjusted_logit = logit / temperature
Dividing by a small number (0.2) makes large logits even larger relative to small ones, concentrating probability on the top candidate. Dividing by a large number (1.5) flattens the distribution, giving rare tokens a real chance.
Examples
- Temperature 0.0 — "The capital of France is Paris." (Always same answer.)
- Temperature 0.3 — Customer support reply that stays on-brand and factual.
- Temperature 0.7 — Balanced blog draft with some variation between runs.
- Temperature 1.2 — Poetry or brainstorming with surprising word choices.
- Temperature 1.8 — Experimental fiction or intentional chaos.
Temperature vs Top-p
Top-p (nucleus sampling) limits the model to the smallest set of tokens whose combined probability exceeds p. Temperature rescales; top-p truncates. Most teams tune one or the other — not both aggressively. Anthropic's Claude API docs recommend adjusting only one at a time.
When to Use
- Low (0.0 to 0.3): factual Q&A, code generation, structured data extraction
- Medium (0.5 to 0.8): general chat, blog drafts, summaries
- High (1.0 to 1.5): creative writing, brainstorming, marketing taglines
FAQs
Does temperature 0 guarantee identical output? Usually yes, but tiny floating-point differences across hardware can still cause minor variation.
Can temperature fix hallucinations? Lowering it reduces creative drift but does not eliminate factual errors. For grounding, use RAG.
What about temperature for code? Most developers use 0.0 to 0.2 for code completion to avoid syntactic surprises.
Is default temperature the same everywhere? No — OpenAI defaults to 1.0, Anthropic to 1.0, and many tools override to 0.7.
Does higher temperature mean smarter output? No. Higher = more diverse, not more accurate.
Can I go above 2.0? Most APIs cap at 2.0 because output becomes incoherent.
Does it affect cost? No — temperature changes sampling, not token count billed.
Conclusion
Temperature is the simplest lever for shaping AI output. Start with 0.7, drop to 0 for facts, raise past 1 for creativity. Learn more AI concepts on Misar Blog↗.