Stop Sequence

A stop sequence tells the model when to stop generating text. It acts like a cutoff signal when the model sees a specific string in its output, it immediately stops responding, even if it hasn't hit the max_tokens limit.

How does it work

You provide one or more strings as stop sequences (e.g., ["\nHuman:", "<END>"]). As the model generates tokens, it checks whether the latest text matches any of these sequences. If it does, generation stops, and the output is returned up to that point.

When to use stop sequence

When building chatbots or multi-turn dialogues where you want to stop before the next user prompt appears
When using function-like patterns in prompts (e.g., Q: and A: pairs)
When you need to cut off extra explanation or formatting
When you want to extract clean data, especially when paired with structured prompts

How to use stop sequence

Define your stop strings. E.g., ["\nUser:", "###"]
Add them to your API call as stop: ["<your_string>"]
Make sure the stop sequence doesn't appear too early unless intentional
Test different formats especially if you're working with structured outputs or custom prompt patterns

Tips

You can use multiple stop sequences at once (up to 4 in OpenAI models)
Stop sequences are case-sensitive and match exactly, watch for whitespace
Useful when the model keeps going too long or starts repeating itself
Works well with streaming if you want to cut off text live as it comes in

LLM Parameters Guide

Stop Sequence

How does it work

When to use stop sequence

How to use stop sequence

Tips