Top K
Top K is a sampling parameter that limits the model to choosing the next token from only the top K most likely options, rather than considering the full vocabulary. It controls how "focused" or "random" the model's outputs can be.
How does it work
At each step of generation, the model calculates a probability distribution over all possible next tokens. Top K filters this list down to the K highest-probability tokens, and then samples from just those. A lower K makes the output more deterministic; a higher K allows more variety.
When to use Top K
- When you want to reduce randomness while still allowing some creativity
- When outputs need to be coherent and consistent, such as in summarization or formal tone
- When combined with temperature, to better control variability in open-ended generation
How to use Top K
- A common top_k value is between 20 and 100
- Pass it as the top_k parameter in your model call (if supported)
- Optionally tune in combination with temperature or top_p for better balance
- Monitor outputs if results feel too repetitive, try increasing K or adjusting other sampling parameters
Tips
- A Top K of 1 means the model always picks the highest-probability token (greedy decoding)
- Top K and Top P are often used together, you don't have to choose one or the other
- Increasing K without adjusting temperature can lead to nonsensical or unfocused outputs