LLM Parameters Guide

Learn how to control and fine-tune language model behavior with core LLM parameters

Return model responses in predictable, machine-readable formats like JSON.

Control randomness by limiting token selection to the top K most likely options.

Limit the token selection to a dynamically chosen set based on cumulative probability.

Let the model return structured tool calls that can trigger backend actions.

Force the model to return only valid JSON without extra text or formatting.

Stream model output token-by-token in real time to improve responsiveness.

Limit how many tokens the model can generate in its response.

Get repeatable results from the model by fixing the random generation seed.

Instruct the model to stop generating once a specific string is produced.

Discourage the model from repeating ideas or concepts it has already mentioned.

Reduce how often the same token appears in a single response.

Inspect the model's confidence in each token it generates by returning probabilities.

Control the creativity of the model's output by adjusting randomness.