OpenAI employees openly accuse xAI's latest AI model Grok3's benchmark results are misleading

Golden Finance reported that recently, an OpenAI employee publicly accused the xAI company owned by Elon Musk, saying that the benchmark test results of its latest AI model Grok3 were misleading. In response, xAI co-founder Igor Babushkin insisted that the company was not inappropriate. The chart for xAI shows that the two versions of Grok3—Grok3 Reasoning Beta and Grok3 mini Reasoning—performed OpenAI’s current strongest available model o3-mini-high. However, OpenAI employees quickly pointed out on the X platform that the xAI chart does not contain the AIME 2025 score of o3-mini-high under the "cons@64" condition. Babushkin argued on the X platform that OpenAI has published similar misleading benchmark charts in the past. Although these charts are used to compare the performance of their own models.