Top K

Top K is a sampling parameter that limits the model to choosing the next token from only the top K most likely options, rather than considering the full vocabulary. It controls how "focused" or "random" the model's outputs can be.

At each step of generation, the model calculates a probability distribution over all possible next tokens. Top K filters this list down to the K highest-probability tokens, and then samples from just those. A lower K makes the output more deterministic; a higher K allows more variety.

  • When you want to reduce randomness while still allowing some creativity
  • When outputs need to be coherent and consistent, such as in summarization or formal tone
  • When combined with temperature, to better control variability in open-ended generation

  1. A common top_k value is between 20 and 100
  2. Pass it as the top_k parameter in your model call (if supported)
  3. Optionally tune in combination with temperature or top_p for better balance
  4. Monitor outputs if results feel too repetitive, try increasing K or adjusting other sampling parameters

  • A Top K of 1 means the model always picks the highest-probability token (greedy decoding)
  • Top K and Top P are often used together, you don't have to choose one or the other
  • Increasing K without adjusting temperature can lead to nonsensical or unfocused outputs