OpenAI officially releases Sora. Understand the strength of its Wensheng video function in one article.

Source: Geek Park

As speculated, on the third day of the 12-day live broadcast, OpenAI Vincent video product Sora was officially released.

At 2 a.m. on December 10, Beijing time, Sam Altman and several internal OpenAI employees demonstrated Sora’s functions and practical use cases through a live broadcast. After releasing video samples in February this year, Sora triggered a craze in the global artificial intelligence community. Since then, foreign artificial intelligence companies have launched Vincent video products. As the pioneer of this track, Sora finally unveiled its mystery today.

Overall, the series of product functions displayed by Sora show that it exceeds the current Wensheng video products in terms of the quality of video generation, the originality of functions, and the complexity of technology.

On top of the basic functions of text and picture videos, it adds functions such as storyboarding (equivalent to creating your own story through storyboards), adjusting the original video with text, and integrating videos from different scenes ( Equivalent to adding special effects directly to the video), the entire product function design seems to make the video closer to the creator's self-expression and help them complete an ideal shot story.

Later on December 9, local time, users in the United States and most other countries can visit the official website to experience Sora. It is included with membership subscriptions to ChatGPT Plus, ChatGPT Pro at no additional cost. Among them, Plus can generate up to 50 premium videos with a video resolution of up to 720p and a duration of 5 seconds, while Pro can generate up to 500 premium videos with a resolution of up to 1080p and a duration of 20 seconds, and can also remove watermarks.

Sam Altman introduced Sora for three main reasons:

First, from a tool perspective, OpenAI likes to make tools for creative people, which is very important to the company’s culture;

Second, from the perspective of user interaction, artificial intelligence systems cannot only interact through text, but should also understand and generate videos to help humans use artificial intelligence. This is similar to what big model companies talk about, "Every time the model expands a modality, the user penetration rate will increase."

The third is from a technical perspective, which is crucial to OpenAI's AGI roadmap , artificial intelligence should learn more about the laws of the world, which is the so-called "world model" for understanding physical laws.

We must not only use technology to change the world, but also use products to promote human creation. This is what Sora is doing.

01In addition to generating videos, it can also storyboard, add special effects, and create unlimited creations

The most basic of Sora is the first and foremost video and picture functions.

Open the main interface, where users can view and manage all video-generated content, switch between grid view and list view, create folders and favorites, view bookmarks, etc. Researchers say this main interface design is to better help users create stories.

At the middle bottom of the main page, there are Sora's Wensheng video and Tusheng video functions.

For example, Sam Altman first provided text input, "Woolly mammoths walking in the desert, shot with a wide-angle lens." Next, you need to select the video aspect ratio, resolution, duration (5-20 seconds), and the number of videos finally generated (up to four segments can be generated for selection) to obtain the generated video.

In the end, you can see that the generated video effect is very realistic and textured, and basically follows the input instructions. Perhaps people are not surprised by the excellent performance of Sora's video generation effects.

After entering the text "Woolly mammoths walking in the desert, shot with a wide-angle lens", Sora generated four videos | Pictures Source: OpenAI

But this time, Sora also released a series of unique and advanced product features. According to Geek Park, these functions basically focus on the more accurate expression of video, that is, through storyboarding, adding special effects, etc., allowing people to create a story they want through video.

The first is storyboard, which researchers call a "new creative tool."

From a product design perspective, it is equivalent to cutting a story (video) into multiple different story cards (video frames) according to the timeline. Users only need to design and adjust each story card (video frame), and Sora will automatically complete them into a smooth story (video) - this is much like storyboards in movies and animation manuscripts. When the director draws the storyboards , a film was shot, a cartoonist wrote the manuscript, and an animation was designed.

For example, the first storyboard envisioned by the researchers is, "The beautiful white crane stands in the stream with a yellow tail." The second storyboard is, "The crane puts its head into the water. , and caught a fish.” What he did was to create these two story cards (video frames) respectively, and set an interval of about five seconds between them. This gap is important to Sora, giving it room to connect the two sets of moves.

Finally, he got a complete video shot, "The beautiful white crane stood in the creek. It had a yellow tail. Then the crane put its head into the water and caught a fish."

With two story cards (video frames), Sora generates a complete story (video) | Image source: OpenAI

What’s even more amazing is that in this storyboard, the creative elements are not just story cards, but also direct pictures and videos. In other words, any pictures or videos can be used. Pull it to the storyboard, combine it with the story card, and create it.

Taking the video as an example, the researchers cut out the video of Bai He and imported it into the storyboard and cut it. This left a gap for continued creation in the front and back of the video, which means that there can be a new beginning and end. < /p>

The imagination this brings is that storyboards can be created infinitely. The 20-second video generated by Sora can be continuously created, cut, and created... until the ideal shot is completely achieved. This process is like an editor or director, who continuously generates and edits storyboard design and shot materials. , slowly cut out the film in your heart.

Unlike the real world, the materials provided by Sora are unlimited. And unlike other Wensheng video products, Sora. The video can be modified and processed. This makes the video it generates more consistent with the user's imagination and creativity.

This seems to be the core idea of Sora's product: to make it as possible as possible. The generated video conforms to the creativity that the user wants.

This can better understand Sora's other functions, such as the ability to directly modify the video through text, the ability to seamlessly merge two different videos, and the ability to change the video. Painting style, etc., this is equivalent to directly adding "special effects" to the video. General Wensheng video products may need to constantly adjust the prompt (prompt word) and constantly regenerate the video.

By adjusting the text, users can directly adjust the video | Image source: OpenAI

< p style="text-align: center;">Sora can merge two videos into one seamless clip | Image source: OpenAI

In general, Sora In addition to its unexpectedly excellent performance in generating videos, it also brings more unique video creation product functions, which are equivalent to adding storyboards, editing, and special effects to videos. This means that everyone has the opportunity to create the expression they really want, and is closer to becoming a director.

“If you go into Sora with the expectation that you can just click a button to generate a movie, then I think your expectation is wrong,” OpenAI researchers said.

He said that Sora is a tool that allows people to try multiple ideas in multiple places at the same time, and try things that were completely impossible before."We actually think this is a super special extension for creators."

02 ServicesThe public is not charged separately , still rely on the ability of the underlying model

As the founder of the Vincent video track, Sora's launch time is considered the latest. In this regard, the OpenAI research team stated that in order to widely deploy Sora, it is necessary to find ways to make the model faster and cheaper. To this end, the research team has done a lot of work.

During the livestream, OpenAI announced Sora turbo, a new high-end accelerated version of the original Sora model. It has all the features OpenAI talked about in its "World Simulation Technology" report earlier this year, plus the ability to generate video from text, animated images, and hybrid videos. This is the technical basis behind this Sora product feature.

It seems that video reasoning is more expensive than text, but this time OpenAI did not charge Sora alone. Sora is available with the $20/month ChatGPT Plus membership, and the $200/month ChatGPT Pro membership.

The benefits of the former include up to 50 premium videos with a resolution of up to 720p and a duration of 5 seconds. The benefits of the latter include up to 500 premium videos and unlimited normal videos with a resolution of up to 1080p and a duration of It takes 20 seconds and downloads without watermark.

Usage quota of Sora by different members | Image source: OpenAI

Sora’s significance to OpenAI is more than just this. The team found that video models exhibit many interesting new capabilities when trained at scale, allowing Sora to simulate certain aspects of real-world people, animals, and environments. "Our results show that extending video generation models is a promising path to building a universal simulator of the physical world."

Perhaps it is for this reason that Sora can be used by the public as soon as possible and use data to better Training the world model is so important to OpenAI's ultimate AGI dream.

On the road of iterating technology, it also promotes human creation.

“This version of Sora will make mistakes, it’s not perfect, but it’s gotten to the point where we think it will be very useful for enhancing human creativity. We can’t wait to see what the world will do with it. What to do." said OpenAI, which created it.

Online Consultation