Top K
Top K is a sampling parameter that limits the model to choosing the next token from only the top K most likely options, rather than considering the full vocabulary. It controls how "focused" or "random" the model's outputs can be.
At each step of generation, the model calculates a probability distribution over all possible next tokens. Top K filters this list down to the K highest-probability tokens, and then samples from just those. A lower K makes the output more deterministic; a higher K allows more variety.
- When you want to reduce randomness while still allowing some creativity
- When outputs need to be coherent and consistent, such as in summarization or formal tone
- When combined with temperature, to better control variability in open-ended generation
- A common top_k value is between 20 and 100
- Pass it as the top_k parameter in your model call (if supported)
- Optionally tune in combination with temperature or top_p for better balance
- Monitor outputs if results feel too repetitive, try increasing K or adjusting other sampling parameters
- A Top K of 1 means the model always picks the highest-probability token (greedy decoding)
- Top K and Top P are often used together, you don't have to choose one or the other
- Increasing K without adjusting temperature can lead to nonsensical or unfocused outputs