[State of Code Evals] After SWE-bench, Code Clash & SOTA Coding Benchmarks recap — John Yang

2025-12-31

John Yang, creator of SWE-bench, discusses its evolution into a standard for evaluating AI coding agents. He also covers CodeClash and other benchmarks, touching on topics like verification methods, data challenges, and the potential for human-AI collaboration in this field.

Listen