Contact Support
    Google/
    Gemini-2.5-flash-lite

    Model Card

    gemini-2.5-flash model

    Gemini 2.5 Flash Lite excels at high-volume, latency-sensitive tasks like translation and classification. The model is best for high volume, cost efficient tasks.

    Key Features

    • Optimized for Speed & Cost
      Designed for high-volume, latency-sensitive tasks like translation, classification, and chat—with lower compute and faster responses.

    • Thinking Mode Support
      Enables step-by-step reasoning using thinking budgets for better output quality when needed.

    • Improved Tool Use
      Includes search and code execution tools—bringing it closer to agentic use cases.

    • Enhanced Reasoning & Coding
      Outperforms 2.0 Flash-Lite across reasoning, math, science, and code benchmarks (e.g., SWE-bench, AIME, Aider Polyglot).

    • Multimodal Input Support
      Handles text, image, video, audio, and now PDF inputs with up to 1M input tokens.

    • Large Output Window
      Supports up to 64K output tokens—ideal for long responses and rich code generation.

    • Cost-Efficient Inference
      Most affordable Gemini 2.5 variant, with additional savings from prompt caching and batch processing.

    • Latest Knowledge
      Updated with a January 2025 knowledge cutoff, improving performance on current topics and tasks.

    • Available Everywhere You Build
      Deployable via Google AI Studio, Gemini API, and Vertex AI.

    Meta data

    upto 1M tokens
    $0.1 per million
    $0.4 per million
    Jan 2025
    Create an agent Pipe