Stop Sequence

A stop sequence tells the model when to stop generating text. It acts like a cutoff signal when the model sees a specific string in its output, it immediately stops responding, even if it hasn't hit the max_tokens limit.

You provide one or more strings as stop sequences (e.g., ["\nHuman:", "<END>"]). As the model generates tokens, it checks whether the latest text matches any of these sequences. If it does, generation stops, and the output is returned up to that point.

  • When building chatbots or multi-turn dialogues where you want to stop before the next user prompt appears
  • When using function-like patterns in prompts (e.g., Q: and A: pairs)
  • When you need to cut off extra explanation or formatting
  • When you want to extract clean data, especially when paired with structured prompts

  1. Define your stop strings. E.g., ["\nUser:", "###"]
  2. Add them to your API call as stop: ["<your_string>"]
  3. Make sure the stop sequence doesn't appear too early unless intentional
  4. Test different formats especially if you're working with structured outputs or custom prompt patterns

  • You can use multiple stop sequences at once (up to 4 in OpenAI models)
  • Stop sequences are case-sensitive and match exactly, watch for whitespace
  • Useful when the model keeps going too long or starts repeating itself
  • Works well with streaming if you want to cut off text live as it comes in