Stop Sequence

A stop sequence tells the model when to stop generating text. It acts like a cutoff signal when the model sees a specific string in its output, it immediately stops responding, even if it hasn't hit the max_tokens limit.

How does it work

You provide one or more strings as stop sequences (e.g., ["\nHuman:", "<END>"]). As the model generates tokens, it checks whether the latest text matches any of these sequences. If it does, generation stops, and the output is returned up to that point.

When to use stop sequence

  • When building chatbots or multi-turn dialogues where you want to stop before the next user prompt appears
  • When using function-like patterns in prompts (e.g., Q: and A: pairs)
  • When you need to cut off extra explanation or formatting
  • When you want to extract clean data, especially when paired with structured prompts

How to use stop sequence

  1. Define your stop strings. E.g., ["\nUser:", "###"]
  2. Add them to your API call as stop: ["<your_string>"]
  3. Make sure the stop sequence doesn't appear too early unless intentional
  4. Test different formats especially if you're working with structured outputs or custom prompt patterns

Tips

  • You can use multiple stop sequences at once (up to 4 in OpenAI models)
  • Stop sequences are case-sensitive and match exactly, watch for whitespace
  • Useful when the model keeps going too long or starts repeating itself
  • Works well with streaming if you want to cut off text live as it comes in