Why AI Evaluation Science Can't Keep Up (with Carina Prunkl)

2026-04-17

Carina Prunkl discusses the challenges in evaluating general-purpose AI, noting how systems excel at complex tasks yet fail simple ones, and how rapid capability gains increase misuse risks. The conversation covers testing limitations, de-skilling, and AI-related risks.

Listen