OpenAI releases CoT monitoring to prevent malicious behavior of big models
Editor
2 hours ago 8,174
Share to:
According to Golden Finance, OpenAI has released the latest research, using CoT (thinking chain) monitoring method, it can prevent malicious behaviors such as talking nonsense and hiding true intentions, and it is also one of the effective tools to supervise super models. OpenAI uses the newly released cutting-edge model o3-mini as the monitored object and uses the weaker GPT-4o model as the monitor. The test environment is a coding task that requires AI to implement functions in the code base to pass unit testing. The results show that the CoT monitor performed excellently in detecting systemic "reward hackers" behavior, with a recall rate of up to 95%, far exceeding 60% of the monitoring behavior only.