Temperature

Temperature controls the randomness of the model's output. It determines how likely the model is to pick less probable tokens during generation. Lower temperature makes responses more focused and predictable; higher values make them more creative and diverse.

How does it work

When generating text, the model assigns probabilities to each possible next token. The temperature value adjusts how sharply those probabilities are weighted. A low temperature makes the model choose high-probability tokens more often. A high temperature flattens the distribution, making it more likely to sample lower-probability tokens.

When to use temperature

Use low temperature (0–0.3) for tasks that need accuracy or consistency (e.g., code, summaries, factual answers)
Use medium temperature (0.5–0.7) for balanced output, coherent but slightly varied
Use high temperature (0.8–1.0) for creative or exploratory tasks like storytelling, brainstorming, or poetry

How to use temperature

Choose a value between 0 and 1. Lower value results in focused outputs, higher value results in more creative, random outputs
Add temperature to your API call
Combine with top_p or top_k for better control over randomness
Experiment with different values for your specific use case

Tips

A temperature of 0 makes the output fully deterministic
Creative outputs generally benefit from a temperature between 0.7 and 0.9
For production systems that need reliability, use lower values
Temperature does not affect the prompt, it only influences how the model generates completions

LLM Parameters Guide

Temperature

How does it work

When to use temperature

How to use temperature

Tips