Benchmarking AI Agents on Full-Stack Coding

2025-03-28

Martin Casado and Sujay Jayakar discuss Sujay's team's work benchmarking AI agents on full-stack coding tasks using Fullstack Bench. They explore the difficulties of autonomous software development, the significance of type safety and other guardrails for reducing variance, and…

Listen