Top P (Nucleus Sampling)
Top P, also known as nucleus sampling, is a decoding parameter that limits the model's choices to the smallest possible set of tokens whose cumulative probability exceeds a threshold P. It helps balance creativity and coherence in the model's responses.
How does it work
Instead of picking from a fixed number of top tokens (like Top K), Top P dynamically selects a set of tokens based on their probabilities. For example, if top_p = 0.9, the model will only sample from the most likely tokens whose combined probability adds up to 90%. This set can vary in size at each step of generation.
When to use Top P
- When you want more flexibility than Top K allows
- When aiming for natural-sounding, diverse outputs without sacrificing quality
- When generating creative content, like stories, brainstorms, or casual dialogue
How to use Top P
- A common top_p value is between 0.8 and 0.95
- Include it as the top_p parameter in your API call
- Use temperature along with it for more nuanced control
- Test different values. Higher values give more randomness; lower ones increase focus
Tips
top_p = 1.0
disables nucleus sampling entirely (i.e., all tokens are considered)- Lower top_p values can limit hallucinations but may sound dull or repetitive
- For most natural outputs, top_p of 0.9 combined with temperature of 0.7-0.9 is a good starting point
- Use in combination with Top K cautiously. If both are set, they work together to restrict token sampling even further