LessWrong (Curated & Popular)

"Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes" by Alex Mallen, ryan_greenblatt

Anthropic inadvertently trained against the chain of thought (CoT) in approximately 8% of Claude Mythos Preview training episodes. This oversight error, noted as the second such incident, raises concerns about AI safety processes and the reliability of monitoring AI reasoning…

Listen