Stop Sequence
A stop sequence tells the model when to stop generating text. It acts like a cutoff signal when the model sees a specific string in its output, it immediately stops responding, even if it hasn't hit the max_tokens limit.
You provide one or more strings as stop sequences (e.g., ["\nHuman:", "<END>"]
). As the model generates tokens, it checks whether the latest text matches any of these sequences. If it does, generation stops, and the output is returned up to that point.
- When building chatbots or multi-turn dialogues where you want to stop before the next user prompt appears
- When using function-like patterns in prompts (e.g., Q: and A: pairs)
- When you need to cut off extra explanation or formatting
- When you want to extract clean data, especially when paired with structured prompts
- Define your stop strings. E.g.,
["\nUser:", "###"]
- Add them to your API call as
stop: ["<your_string>"]
- Make sure the stop sequence doesn't appear too early unless intentional
- Test different formats especially if you're working with structured outputs or custom prompt patterns
- You can use multiple stop sequences at once (up to 4 in OpenAI models)
- Stop sequences are case-sensitive and match exactly, watch for whitespace
- Useful when the model keeps going too long or starts repeating itself
- Works well with streaming if you want to cut off text live as it comes in