Why Your Agent is Cheating
In this episode, Pierce and Richard discuss reward hacking and why models often learn incorrect lessons. They also explain practical fine-tuning, why LLMs use tokens instead of words, and how context length is limited by hardware.