"Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes" by Alex Mallen, ryan_greenblatt

2026-04-14

Anthropic inadvertently trained against the chain of thought (CoT) in approximately 8% of Claude Mythos Preview training episodes. This oversight error, noted as the second such incident, raises concerns about AI safety processes and the reliability of monitoring AI reasoning…

Listen