Pretrained

Why Your Agent is Cheating

In this episode, Pierce and Richard discuss reward hacking and why models often learn incorrect lessons. They also explain practical fine-tuning, why LLMs use tokens instead of words, and how context length is limited by hardware.

Listen