From the 12-day OpenAI press conference, we saw four key issues in the industry

Image source: generated by Unbounded AI

For the first time in history, a company will hold product launches for 12 consecutive days - when OpenAI announced this decision , the expectations of the global technology circle are full. But until the press conference came to an end, "That's it? That's it?" An AI practitioner expressed his feelings like this.

This seems to represent a certain mainstream view: this OpenAI conference has few highlights and is lower than expected.

In the first eleven days, OpenAI’s press conference involved many important updates in technology, product form, business model and industrial ecology, including the complete inference model o1, enhanced fine-tuning, Vincent Video Sora, and stronger Writing and coding tool Canvas, deep integration with the Apple ecosystem, voice and visual capabilities, Projects functionality, ChatGPT search, calling ChatGPT and WhatsApp chat, and more.

But just as the reasons for the disappointment of AI practitioners mentioned above, "I thought GPT-5 would be released." On the second day after the press conference, foreign media reported that OpenAI's GPT-5 research and development was blocked.

However, the o3 released on the last day is an exception. It is the next generation inference model of o1, and it performs amazingly in many tests such as mathematics, code, and physics. A technical person from a large domestic model company talked about the shock that o3 brought to him, "AGI is here." He said explain. Technical people speak highly of o3.

Looking back at these 12 days of press conferences, OpenAI showed off its technical "muscles" while constantly optimizing its product form and expanding its application space. Someone joked that just like a "live broadcast", OpenAI hopes to attract more users and developers to use ChatGPT. In the new year, OpenAI may usher in a leap in daily activity, revenue and other data.

o3 Press Conference｜Image source: OpenAI

But this process may not necessarily go smoothly. Although model capabilities have become stronger, there is still a long distance between powerful models and application implementation due to data constraints, packaging capabilities, and high model costs.

OpenAI’s press conference seems to reveal a trend: the current competition focus in the large model industry is not only on model parameters and technical upper limits, but also on user experience and market size. Both need to go hand in hand to stay ahead.

After sorting out the main information of the 12 OpenAI conferences and communicating with people in the domestic large model industry, Geek Park summarized the following key points.

01. The depth of intelligence of o3 is enough, but whether it can be called AGI depends on the breadth of intelligence

"Crazy, so crazy." This was the first reaction of a domestic model manager after seeing o3.

On complex problems such as mathematics, coding, and doctoral-level scientific questions and answers, o3 has shown a level beyond some human experts. For example, in the GPQA Diamond, a doctoral-level science examination involving biology, physics and chemistry, o3's accuracy reached 87.7%, while doctoral experts in these fields can only achieve 70%; in the American AIME mathematics competition, o3 achieved 96.7 He only got one question wrong, which is equivalent to the level of a top mathematician.

What is widely discussed is its coding capabilities. On Codeforces, currently the world's largest algorithm training and competition platform, o3 scored 2727 points, an increase of more than 800 points compared to o1, which is equivalent to a human player ranked 175th. Even, it surpassed OpenAI’s Senior Vice President of Research Mark Chen (who scored 2500 points).

Comparison of code capabilities of o1-preview, o1, and o3 | Image source: OpenAI

Since the o1-preview version was launched in September, in just three months, the o1 series The model has achieved super evolution in reasoning capabilities. The full version of o1, launched on the first day of the press conference, improves thinking speed by about 50% compared to o1-preview, reduces major errors for difficult real-world problems by 34%, and also supports multi-modal input (recognizable images). Today's o3 has surpassed the level of some human experts on complex problems.

"From o1 to o3, the ability of the model is improved by increasing the amount of inference calculations. With the release of Deepseek-R1 and Gemini 2.0 Flash Thinking at home and abroad, it shows that large models are beginning to shift from pre-training Scaling Law Scaling Law of reasoning." Liu Zhiyuan, a permanent associate professor at Tsinghua University and the founder of wall-facing intelligence, told Geek Park.

Since OpenAI released o1-preview, the technical paradigm of the large model wave has switched from the initial pre-training Scaling Law (scaling law), which continuously expands model training parameters and improves its intelligence upper limit, to a new round of , upgraded technical paradigm, that is, injecting reinforcement learning into the reasoning stage to improve complex reasoning capabilities.

In the former paradigm, the model mainly gives answers through next token prediction (next word prediction), which is more inclined to "quick thinking". It's like "reading thousands of books", but "learning without thinking is useless" and cannot complete more complex reasoning tasks such as mathematics and programming.

In the latter paradigm, the model will not immediatelyInstead of giving answers, you can "think slowly", first introduce CoT (chain of thinking), plan and decompose complex problems into simpler steps, and finally get the results. When the method doesn't work, it will try another method to improve complex reasoning capabilities in reinforcement learning - as the model continues to perform "slow thinking" and reinforcement learning, its reasoning capabilities will increase exponentially. This is reasoning. Scaling Law.

For o3 to surpass the super research reasoning ability of human experts - in Liu Zhiyuan's view, this shows that o3 is moving in the direction of "ultra-intelligent supercomputer".

Many industry insiders believe that this will have a profound impact on the field of cutting-edge science. From a positive perspective, o3's strong research reasoning capabilities can help promote basic scientific research in mathematics, physics, biology, chemistry and other disciplines. However, some people are worried that it will affect the work of scientific researchers.

The amazing depth of intelligence brought by o3 this time seems to indicate the dawn of AGI. But in Liu Zhiyuan’s view, just as the symbol of the information revolution is not large computers but the popularization of personal computers (PCs), AGI can only be popularized and benefited by allowing everyone to have their own large model and solve their daily problems. Problems mean the real intelligent revolution.

“After all, we don’t need Terence Teru and Hinton (both top scientists) to solve our daily problems for us.” He said.

The key issue behind this is whether the intelligence depth of the o3 model can be generalized to other fields and has sufficient intelligence breadth - in the view of the above-mentioned technical person from a large domestic model company, Only by simultaneously breaking through the depth and breadth of intelligence can it be called AGI. He was optimistic about this, "It's like a transfer student came to your class. You haven't had any contact with him, but he ranked first in the class in mathematics and programming. Do you think he will be poor in Chinese and English?"

For large domestic model companies, the core issue is how to catch up with o3. Judging from key elements such as training architecture, data, training methods and evaluation data sets, this seems to be a problem that engineering can solve.

"How far do you think we are before we have an o3-level open source model?"

"In one year." The person in charge of the above model answered.

02. The model is just an engine, the key is to help developers use it

Although o3 has strong model capabilities, some people at the application layer It seems that there is still a long distance between the model and the practical application. "Today OpenAI trained Einstein, but if you want to become the chief scientist of a listed company, there is still a distance." Zhou Jian, founder and CEO of Lanma Technology, told Geek Park.

As the middle layer of large models, Lanma Technology is one of the earliest companies in China to explore the application of large models and create AI Agents.company. In Zhou Jian's view, a large model is just an infrastructure that requires a lot of work combined with the scenario to be used. The main constraint at present is data.

In many scenarios, it is difficult to obtain complete data, and a lot of data is not even digitized. For example, headhunters may need resume data, but a lot of resume data has not been digitized.

Cost is the most critical factor affecting the implementation of the o series model. According to the ARC-AGI test standard, each task of o3-low (low calculation mode) costs 20 US dollars, and each task of o3-high (high calculation mode) costs thousands of dollars - even if you ask the simplest question, It will cost nearly 20,000 yuan. The benefits and costs cannot be balanced at all, and it may take a long time for o3 to be implemented.

Cost calculation of o series models | Image source: ARC-AGI test standard

In terms of helping the implementation of model applications, OpenAI also released corresponding functional solutions at the press conference . For example, the next day, OpenAI released the AI Reinforcement Fine-Tuning function specifically for developers, which is the function Zhou Jian is most concerned about. It means that the model can optimize reasoning capabilities and improve performance through a small amount of data.

This is especially suitable for applications in the field of refinement. OpenAI technologists say it can help any field that requires deep expertise in AI models, such as law, finance, engineering, and insurance. One example is that Thomson Reuters recently used enhanced fine-tuning to fine-tune o1-mini, resulting in a useful AI legal assistant that helps their legal professionals complete some of the "most analytical workflows."

For example, on the ninth day, the o1 model was finally open to developers. It supports function calls and visual capabilities; it introduced WebRTC to realize real-time voice application development; it launched a preference fine-tuning function to help developers customize models; it released Go and Java SDKs so that developers can quickly get started with integration.

At the same time, it brings lower cost and higher quality 4o speech model. Among them, the price of 4o audio has been reduced by 60% to input $40/million tokens and output $80/million tokens. The price of cached audio has been reduced by 87.5% to $2.50/million tokens; for developers with limited budgets, OpenAI has launched GPT- With the 4o mini, the audio cost is only a quarter of that of the 4o.

This new feature is also of concern to Zhou Jian. He believes that the updated real-time voice, visual recognition and other functions will be better applied in scenarios such as marketing, telephone customer service and outbound sales calls. According to his experience, when OpenAI launches certain leading technologies, it usually takes 6-12 months for the country to catch up. This made him hate ShinichiWe are confident about our application business in 2019.

03. Sora’s video generation is lower than expected, but the product opening will improve its physical simulation capabilities

When OpenAI released Sora’s demo at the beginning of the year, It caused a shock in the global technology circle. But throughout the year, major domestic model companies have been vying for the Vincent Video track - when OpenAI officially released Sora on the third day of the press conference, the domestic Vincent Video companies breathed a sigh of relief.

“There is basically nothing that exceeds expectations. Realism, physical characteristics and other aspects have not changed significantly compared to the release in February. From the perspective of basic model capabilities, it is lower than expected.” Tang Jiayu, co-founder and CEO of Shengshu Technology, told Geek Park.

At present, companies such as Byte, Kuaishou, MiniMax, Zhipu, Shengshu, and Aishi have all launched their own Wensheng video products. "Sora's overall effectiveness and strength do not have a clear lead. We see that we and OpenAI are indeed advancing hand in hand." Tang Jiayu said.

In his opinion, the slightly brighter part of Sora is that in addition to the basic Wensheng video and Tusheng video, it provides some editing functions that improve the video creation experience, indicating that OpenAI is indeed paying more attention to product experience.

For example, the storyboard function is equivalent to cutting a story (video) into multiple different story cards (video frames) according to the timeline. Users only need to design and adjust each story card (video frame), and Sora will automatically complete them into a smooth story (video) - this is much like storyboards in movies and animation manuscripts. When the director draws the storyboards , the cartoonist writes the manuscript, and an animation or film is ready. It allows creators to express themselves better.

In addition, it also introduces functions such as directly modifying the video with text, seamlessly merging two different videos, and changing the style of the video. They are equivalent to directly adding "special effects" to the video. However, ordinary Vincent video products cannot directly modify the original video. They can only continuously adjust the prompt (prompt words) and generate new videos.

Sora’s storyboard function | Image source: OpenAI

In Tang Jiayu’s view, these functions are indeed designed to give creators greater creative freedom, and similar functions have already In the iterative plan of Vidu (Vinsheng video product of Shengshu Technology). "It is not difficult for us to implement these functions of Sora, and the implementation path is already very clear." He said.

At the press conference, Sam Altman explained the reasons for making Sora: first, the instrumental value, providing creative tools for creatives; second, the interactive value, large models should not only interact through text, but also Extend multi-modality; the third is the most important - it is consistent with the AGI technology vision, SoraBy learning more about the laws of the world, it will eventually be possible to build a "world model" that understands the laws of physics.

In Tang Jiayu’s view, there are still many obvious violations of the laws of physics in the videos currently generated by Sora, which is not much progress compared with the February demo. In his view, after Sora is released, more people will try and explore its physical simulation capabilities. These test samples may play a guiding role in improving its physical simulation.

04. With built-in functions and external ecosystem, can ChatGPT become a Super App?

In addition to the o series models, Sora, and developer services, OpenAI’s main actions at the press conference were to continue to add new features on the product side and optimize the user experience. On the other hand, it is actively promoting in-depth cooperation with Apple and other companies to explore the integration of AI into terminal devices and operating systems.

From the former, we can see that the evolution direction of ChatGPT seems to be to become an "omnipotent, omnipresent, and accessible to everyone" super AI assistant. According to Geek Park, OpenAI’s original vision was to create an “omnipotent” Agent that can understand human instructions, automatically call different tools, and meet human needs. The end, it seems, is the starting point.

For example, on the sixth day, ChatGPT added video calling and Santa Claus voice modes that support screen sharing. The former allows users to make real-time video calls with AI, share the screen or show the surrounding environment, and engage in multi-modal interaction, replicating the scene from the movie "Her".

For example, on the eighth day, ChatGPT opened its search function to all users. In addition to basic search, it also adds voice search; at the same time, it integrates map services on mobile devices and can call up Apple and Google maps to display search result lists; it has also established partnerships with a number of top news and data providers , allowing users to view stock quotes, sports scores, weather forecasts and other information.

For another example, on the eleventh day, ChatGPT announced that it has expanded its integration with desktop software. It can connect to more coding applications, such as BBEdit, MatLab, Nova, Script Editor, etc.; it can be used with Warp (file sharing application), XCode editor and other applications; it can also be used with other applications in voice mode Collaborative work, including Notion, Apple Notes, etc.;

The live demonstration included an example when the user set a "Holiday Party Playlist" in Apple Notes and asked ChatGPT for its opinion on the candidate songs. ChatGPT can point out user errors, such as miswriting the Christmas song "Frosty the Snowman" as "Freezy the Snowman".

ChatGPT pointed out the error of Apple Notes | Image source: OpenAI

"ChatGPT will transform from a simple conversational assistant to a more powerful agent tool." Kevin, chief product officer of OpenAI Kevin Weil said.

On the other hand, OpenAI is also actively expanding its ecology, reaching a wider range of people by integrating into people’s most commonly used terminal devices, operating systems, upper-layer software, etc.

For example, on the fifth day, ChatGPT announced the integration of Apple’s smart ecosystem into iOS, MacOS and iPadOS, allowing users to call AI capabilities across platforms and applications, including Siri interaction, writing tools (Writing Tools), and visual Functions include intelligent recognition of scene content (Visual Intelligence), etc. Through this partnership, ChatGPT reaches billions of Apple users around the world. It also sets a precedent for cooperation between large models, terminals, and operating systems.

For example, on the tenth day, ChatGPT announced its own telephone contact information (1-800-242-8478). US users can call this number to enjoy 15 minutes of free calls every month. Also online is a WhatsApp contact (1-800-242-8478), which allows any user around the world to send messages to this number through WhatsApp, currently only text messages are available.

ChatGPT announced its own phone contact information | Image source: OpenAI

The penetration rate of smartphones and mobile Internet in some countries and regions around the world is far from enough. A basic communication tool, ChatGPT reaches these people. It also reaches its nearly 3 billion users through WhatsApp.

Whether it is internal functions or external ecology, the core of ChatGPT is to hope that the product can reach a wider range of people and become a true Super APP. However, some people are not optimistic about its approach of constantly adding functions and extending its business lines to endless lengths. They even describe it as "spreading a big pie, but each piece of the pie is a bit thin and cannot go deep." Many businesses need to be deep enough to exert value, and there are corresponding companies working on them. This may be a challenge that OpenAI faces.

Although the o3 model allows the outside world to see OpenAI's amazing technical strength, the upper limit of intelligence that Scaling Law can achieve and the difficulty of GPT-5 still make the outside world doubt the company's technology. Development is fraught with doubts. At this press conference, OpenAI turned its attention to product form, cooperation ecology and implementation construction, which is not a different idea.The combination of the two may determine the next direction of the industry.

Online Consultation