LessWrong (Curated & Popular)

"Terrified Comments on Corrigibility in Claude’s Constitution" by Zack_M_Davis

This episode examines the concept of AI corrigibility, a property where an AI allows its preferences to be modified by its creator. It discusses how this desirable but unnatural property, crucial for correcting initial AI alignment errors, is addressed in Claude's Constitution…

Listen