Temperature
Temperature controls the randomness of the model's output. It determines how likely the model is to pick less probable tokens during generation. Lower temperature makes responses more focused and predictable; higher values make them more creative and diverse.
How does it work
When generating text, the model assigns probabilities to each possible next token. The temperature value adjusts how sharply those probabilities are weighted. A low temperature makes the model choose high-probability tokens more often. A high temperature flattens the distribution, making it more likely to sample lower-probability tokens.
When to use temperature
- Use low temperature (0–0.3) for tasks that need accuracy or consistency (e.g., code, summaries, factual answers)
- Use medium temperature (0.5–0.7) for balanced output, coherent but slightly varied
- Use high temperature (0.8–1.0) for creative or exploratory tasks like storytelling, brainstorming, or poetry
How to use temperature
- Choose a value between 0 and 1. Lower value results in focused outputs, higher value results in more creative, random outputs
- Add temperature to your API call
- Combine with top_p or top_k for better control over randomness
- Experiment with different values for your specific use case
Tips
- A temperature of 0 makes the output fully deterministic
- Creative outputs generally benefit from a temperature between 0.7 and 0.9
- For production systems that need reliability, use lower values
- Temperature does not affect the prompt, it only influences how the model generates completions