Contact Support
    OpenAI/o1

    Model Card

    General Performance Metrics

    CategoryBenchmarko1-2024-12-17o1-preview
    GeneralGPQA diamond75.7%73.3%
    MMLU (pass @1)91.8%90.8%
    CodingSWE-bench Verified48.9%41.3%
    LiveCodeBench76.6%52.3%
    MathMATH (pass @1)96.4%85.5%
    AIME 2024 (pass @1)79.2%42.0%
    MGSM (pass @1)89.3%90.8%
    VisionMMMU (pass @1)77.3%
    MathVista (pass @1)71.0%
    FactualitySimpleQA42.6%42.4%
    AgentsTAU-bench (retail)73.5%
    TAU-bench (airline)54.2%

    OpenAI o1 model benchmarks

    Comparative PhD-Level Performance

    SubjectGPT-4oo1 improvement
    Chemistry40.2%64.7%
    Physics59.5%92.8%
    Biology61.6%69.2%

    Key Improvements Over Previous Models

    1. Mathematics Performance

      • 96.4% accuracy on MATH benchmark
      • Significant improvement in AIME 2024 performance (79.2% vs 42.0%)
      • Strong performance in general mathematical reasoning (MGSM: 89.3%)
    2. Coding Capabilities

      • 48.9% accuracy on SWE-bench Verified
      • Notable improvement in LiveCodeBench (76.6% vs 52.3%)
    3. Vision and Multimodal Tasks

      • New capabilities in vision tasks with 77.3% accuracy on MMMU
      • 71.0% accuracy on MathVista

    These benchmarks demonstrate significant improvements across multiple domains, particularly in mathematics, coding, and multimodal tasks, while maintaining strong performance in general knowledge and reasoning capabilities.

    Meta data

    200,000 tokens
    $15 per million
    $60 per million
    Oct 2023
    Create an agent Pipe