OpenAI分析意外思维链评分对模型影响
作者:OpenAI / @OpenAI
发布时间:2026-05-08T20:19:04.000Z
Chain of thought monitors are a key layer of defense against AI agent misalignment. To preserve monitorability, we avoid penalizing misaligned reasoning during RL.
We found a limited amount of accidental CoT grading which affected released models, and are sharing our analysis.
链接卡片:alignment.openai.com We found limited accidental CoT grading in some released models, fixed the affected reward pathways, and found no clear evidence that monitorability degraded.