Gemini 2.5 Flash Lite excels at high-volume, latency-sensitive tasks like translation and classification. The model is best for high volume, cost efficient tasks.
Key Features
-
Optimized for Speed & Cost
Designed for high-volume, latency-sensitive tasks like translation, classification, and chat—with lower compute and faster responses. -
Thinking Mode Support
Enables step-by-step reasoning using thinking budgets for better output quality when needed. -
Improved Tool Use
Includes search and code execution tools—bringing it closer to agentic use cases. -
Enhanced Reasoning & Coding
Outperforms 2.0 Flash-Lite across reasoning, math, science, and code benchmarks (e.g., SWE-bench, AIME, Aider Polyglot). -
Multimodal Input Support
Handles text, image, video, audio, and now PDF inputs with up to 1M input tokens. -
Large Output Window
Supports up to 64K output tokens—ideal for long responses and rich code generation. -
Cost-Efficient Inference
Most affordable Gemini 2.5 variant, with additional savings from prompt caching and batch processing. -
Latest Knowledge
Updated with a January 2025 knowledge cutoff, improving performance on current topics and tasks. -
Available Everywhere You Build
Deployable via Google AI Studio, Gemini API, and Vertex AI.