An article reviewing the OpenAI press conference series: From tools to AGI, OpenAI’s 12-day evolution

Image source: generated by Unbounded AI

OpenAI’s 12-day continuous Devday update at the end of the year has finally come to an end. Watching the press conference every day is like opening a blind box of chocolate. Not sure what flavor will be next.

In the first 11 days of press conferences, most of them were very dull, and only three products had some exciting "flavor".

To sum up, the major updates include: o1 official version, Sora, and Canvas, which were mainly released in the first four days.

Among them, the official version of o1 has indeed improved a lot, and Sora has added a lot of product modes to change AI-generated videos. Canvas can be regarded as OpenAI’s first product attempt to challenge the AI workbench. .

Secondly, what is relatively interesting is the in-depth cooperation with Apple, the video call function, and the enhanced fine-tuning of o1-mini.

The enhanced fine-tuning of o1-mini has great potential in the professional field, and simple fine-tuning can lead to obvious improvements. The video calling function is the amazing “HER” officially launched. The in-depth cooperation with Apple is also a big deal for OpenAI, and it has firmly established its position as the number one player in the AI industry.

Other small product updates make people think-"Is this worthy of a launch?"

These products include the "Projects" project function, o1 image input and 4o The advanced voice API is officially open, ChatGPT Search upgrade and the function of making calls to GPT. They are all relatively small updates, and they are indistinguishable from their competitors.

On the last day, OpenAI finally launched a breakthrough: GPT-o3. In one fell swoop, the suspicion that AI development has reached a bottleneck has been broken, and various performances have gone straight to AGI.

We made a table based on the importance of releasing products to sort out this roller coaster of twelve release days.

Now, let’s talk about the core points of these updates in a little more detail.

Important Product Update

o1 Full Version (Day1)

From a capabilities perspective, o1 has indeed made greater progress than the Preview version. It has improved by 50% compared to o1-preview in terms of International Mathematical Olympiad qualifying questions (AIME 2024) and programming ability test (CodeForces). The rate of critical errors when dealing with complex problems was reduced by 34%.

It can also adjust the processing time according to the difficulty of the question, which reduces user waiting time by more than 50%.

What’s more important is that o1 can also support multi-modal recognition. This greatly increases its usefulness. Doctors can use it to analyze medicalTo learn imaging, engineers can ask it to help them read drawings, and designers can ask it to provide creative suggestions.

But it is also quite expensive. Only users who subscribe to the ChatGPT Pro version for US$200 can enjoy unlimited use, and other ordinary US$20 subscribers can only enjoy 20 times of use per day.

As a product that debuted on the first day, o1 can really make people’s eyes shine.

Sora (Day3)

After waiting for 10 months, Sora finally arrived.

But this is not a model version upgrade, but more like product polishing. The official version of Sora can generate videos up to 20 seconds long and up to 1080p. The generation effect is not much different from the one just released in February.

But OpenAI has really put some thought into the product. Storyboard is the most innovative feature in this release and Sora’s most ambitious attempt. It provides users with a timeline interface similar to professional video editing software. Users can add multiple scene cards to the timeline. Users can connect multiple prompt words together, and the system will automatically handle the transition effects between scenes.

In addition, OpenAI also provides three professional tools: Remix, Blend and Loop. Replace elements in the video, or mix two videos, and automatically complete the video to create an infinite loop.

The products are quite good, but the models that have not been upgraded are not very impressive. In post-release reviews, Sora frequently overturned, with movement, interaction, and physics often being handled poorly. There will also be people and ghosts appearing out of thin air.

The available amount provided by OpenAI is also very stingy. Plus users with US$20 can use it 50 times a month. Only Pro users who pay $200 per month can enjoy unlimited "slow" generation permissions.

Sora finally arrived, but it was quite disappointing.

Canvas (Day4)

To describe it in one sentence, Canvas is the AI version of Google Docs created by OpenAI.

Because Canvas has evolved into a complete workbench integrating intelligent writing, code collaboration and AI agents. It shows OpenAI’s product ambitions beyond Chatbot.

As a writing assistant, you can provide editorial advice.

In terms of programming function, Canvas creates an almost delay-free programming environment through the built-in WebAssembly Python simulator. It also demonstrates the ability to understand the intent of the code.

Like the recently updated Cursor and Devin, it has the ability to customize AI agents. It can complete a series of operations and help you send Christmas letters to your friends.

These three dimensions of Canvas do not operate in isolation. In actual use, they tend toWorking together, this seamless integration makes Canvas a versatile AI-powered creative studio prototype.

But purely from the perspective of front-end display, it is not as good as Claude's Artifacts. The convenience of programming is also not as good as Cursor. So fusion is its highlight.

General product updates

o1-mini enhanced fine-tuning (Day2)

If this product were not so narrow in practicality, it would still be considered a major release.

It changes the past logic of fine-tuning just by adding professional data, but fine-tuning the model with reasoning capabilities in the direction of reinforcement learning. Guide models to have deeper thinking capabilities when facing complex problems.

Now, only "dozens of examples" or even 12 examples are needed for the model to effectively learn reasoning in a specific domain. According to OpenAI's research data, the test pass rate of the o1mini model that has been enhanced and fine-tuned is 24% higher than that of the traditional o1 model, and is a full 82% higher than that of the o1mini that has not been enhanced and fine-tuned.

Unfortunately, o1-mini can only be fine-tuned, and it is only suitable for tasks in complex fields, such as medical, legal or financial and insurance. Less versatile.

Advanced Video Voice Mode (Day6)

This is another old cake on the table. On May 13, during the demonstration of GPT-4o, OpenAI staff were able to have a video call with 4o, see the content of our mobile phone screen in real time, or chat with us or answer questions based on the real-time picture in the camera.

This time it is actually implemented, there is no upgrade. But this function itself is still very important.

However, because this product took a long time to be launched, Vision launched by Microsoft two days ago and Astra, which Google is still launching, have also caught up. OpenAI’s lead is being eroded bit by bit.

Cooperation with Apple (Day5, Day11)

ChatGPT and Apple Intelligence are more like official announcements of in-depth results. If Apple can't figure it out, it can only rely on OpenAI.

The integration mainly includes three aspects: The first is collaboration with Siri. When Siri determines that a task may require the assistance of ChatGPT, it can hand over the task to ChatGPT for processing;

Secondly, the writing tool has been enhanced. Users can now use ChatGPT to write documents from scratch and also complete documents. Refinement and summary;

The third is the camera control function of iPhone 16, which allows users to have a deeper understanding of the subject through visual intelligence.

The Mac integration on the eleventh day later gave GPT more MThe calling permission of ac tool.

The only thing I don’t understand is why they can’t be announced on the same day, not on two days?

Complete capabilities and minor function updates (Day 7, 8, 9, 10)

The remaining updates can only be considered make-up at best. It can be explained in a simple sentence.

"Projects" project feature: It allows users to create a specific project, upload related files, set custom instructions, and centralize all conversations related to the project in one place. Basically the same as Claude's.

ChatGPT search upgrade: can search in the conversation and supports multi-modal output. Perplexity’s Pro mode has been supported for a long time.

4o Hotspot: American users can now use 4o for phone calls! They respect the elderly and love them very much. I think it’s a Double Ninth Festival for them.

o1 image input and 4o advanced speech API are officially open: I suggest this be the last sentence on the day o1 is released.

These past few days have really been a time-procrastinating cycle.

The final king

GPT-o3 (Day 12)

If GPT-o3 had not made its debut on the last day, I really feel that OpenAI held press conferences for 12 consecutive days just to muddy the waters.

Because during this period, Google released Gemini 2 Flash, which is super fast and powerful; Astra, which looks like a real agent; Voe2, which crushes Sora; Gemini 2 Flash Thinking, which is also available in o1. I just posted three announcements and a few videos, blowing away all the releases of OpenAI in the past 11 days.

But on Day 12, OpenAI still found its glory. Use o3 to prove to the industry: Scaling Law is not dead and OpenAI is king.

o3 is the next version of o1. Just three months after o1 was released in September, this new version significantly surpassed OpenAI's previous o1 model on multiple benchmarks including coding, mathematics, and the ARC-AGI benchmark.

Look at a few data comparisons:

Codeforces score: 2727 - equivalent to ranking 175th in the global human programmer coding competition. More than 99% of human programmers.

PhD-Level Science Questions (GPQA): 87.7% - PhD students generally score 70%

Toughest Frontier Math test: 25.2% - no other model exceeded 2% , Mathematics genius Tao Zhexuan said that this test "may stump AI for several years"

The question to prove whether it reaches AGI ARC-AGI: 87.5% - o1's score of 25%

The most worthwhile NoticeThis last test is ARC-AGI, which demonstrates the model's adaptability to new tasks. For comparison, ARC-AGI-1 has only improved from 0% in GPT-3 in 2020 to 5% in GPT-4o in 2024. This means that the model is not rote learning, but actually the ability to solve problems.

Although it performed well in the ARC-AGI test, this does not mean that o3 has reached the AGI level, because it will still fail in some very simple tasks, which is fundamentally different from human intelligence. .

But no matter what, this proves that OpenAI’s choice of enhanced reasoning is a successful paradigm shift. The development of artificial intelligence shows no signs of slowing down. Scaling Law is still in effect.

Those concerns about AI stagnation were swept away by OpenAI’s year-end Christmas gift.

Although the cost of o3 to perform a low computing power calculation is as high as 20 US dollars, and high computing power may even be as high as 3,000 US dollars, it is almost impossible to use it at this stage. However, the computing power will decrease and Scaling Law will continue.

Three months, two top models, OpenAI on the last day of these 12 days let us feel the speed of AI from ChatGPT to GPT4 from the end of 2022 to the beginning of 2023.

Perhaps as Noam Brown, an OpenAI scientist who previously participated in the development of o1, said in an interview, "In 2024, OpenAI is experimenting, and 2025 is the year of full speed ahead."

OpenAI’s 12-day press conference had a bumpy process and ended perfectly. Hopes are laid for AI in 2025.

Online Consultation