The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

This episode features Maohao Shen discussing Satori, a system that uses reinforcement learning and a Chain-of-Action-Thought approach to improve LLM reasoning abilities, allowing for self-reflection and correction.

Listen