Top K

Top K is a sampling parameter that limits the model to choosing the next token from only the top K most likely options, rather than considering the full vocabulary. It controls how "focused" or "random" the model's outputs can be.

How does it work

At each step of generation, the model calculates a probability distribution over all possible next tokens. Top K filters this list down to the K highest-probability tokens, and then samples from just those. A lower K makes the output more deterministic; a higher K allows more variety.

When to use Top K

When you want to reduce randomness while still allowing some creativity
When outputs need to be coherent and consistent, such as in summarization or formal tone
When combined with temperature, to better control variability in open-ended generation

How to use Top K

A common top_k value is between 20 and 100
Pass it as the top_k parameter in your model call (if supported)
Optionally tune in combination with temperature or top_p for better balance
Monitor outputs if results feel too repetitive, try increasing K or adjusting other sampling parameters

Tips

A Top K of 1 means the model always picks the highest-probability token (greedy decoding)
Top K and Top P are often used together, you don't have to choose one or the other
Increasing K without adjusting temperature can lead to nonsensical or unfocused outputs

LLM Parameters Guide

Top K

How does it work

When to use Top K

How to use Top K

Tips