The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726
This episode features Maohao Shen discussing Satori, a system that uses reinforcement learning and a Chain-of-Action-Thought approach to improve LLM reasoning abilities, allowing for self-reflection and correction.