⚡️The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals & Human Data
Mia Glaese and Olivia Watkins discuss OpenAI's decision to stop evaluating SWE-Bench Verified due to saturation and contamination. They explain the shift to SWE-Bench Pro for coding benchmarks.