OpenAI's strongest o3 model was exposed as fake, and it gained privileged access to the FrontierMath test question bank in advance
Editor
2025-01-21 11:01 268
Share to:
Golden Finance reported that an EpochAI contractor named "Meemi" revealed on the Less Wrong forum that OpenAI not only provided financial support for the FrontierMath benchmark test, but also obtained privileged access to the test question bank. Tamay Besiroglu, deputy director of EpochAI and one of the co-founders, quickly admitted the matter on the X platform. We made a mistake by not disclosing OpenAI's involvement in FrontierMath earlier. Our contract prohibits us from doing this before o3 is released. In hindsight, we really should have pushed harder for transparency earlier. We acknowledge this and commit to doing better in the future.
Elliot Glazer, chief mathematician at EpochAI, acknowledged not disclosing information about industry funding during the project and apologized to mathematicians who might not have participated if they had been informed. Regarding the o3 scores, he expressed confidence in the accuracy of the scores reported by OpenAI, but emphasized that EpochAI needs to be verified through an independent reserved test set being developed, and promised that the reserved set evaluation scores will be made public. When questioned about the status of the reserved set, Glazer clarified that this test set is still in development and not completed.
It is reported that FrontierMath is a very important assessment benchmark for advanced mathematical reasoning ability. It was jointly created by EpochAI and more than 60 top mathematicians. Participants include multiple Fields Medal winners and senior proposition writers of the International Mathematical Olympiad.