Here’s a tabular layout evaluating o1-mini on these benchmarks against its competitors including MMLU, GPQA, and MATH-500:
| Benchmarks (in %) | GPT-4o | o1-mini | o1-preview | o1 |
|---|---|---|---|---|
| MMLU | 88.7 | 85.2 | 90.8 | 92.3 |
| GPQA | 53.6 | 60.0 | 73.3 | 77.3 |
| MATH-500 | 60.3 | 90.0 | 85.5 | 94.8 |
Here’s a tabular layout evaluating o1-mini on these benchmarks against its competitors including MMLU, GPQA, and MATH-500:
| Benchmarks (in %) | GPT-4o | o1-mini | o1-preview | o1 |
|---|---|---|---|---|
| MMLU | 88.7 | 85.2 | 90.8 | 92.3 |
| GPQA | 53.6 | 60.0 | 73.3 | 77.3 |
| MATH-500 | 60.3 | 90.0 | 85.5 | 94.8 |