OpenAI Sora officially debuts and the website is crowded

Source: Heart of the Machine

On the third working day of OpenAI's 12-day continuous update, the big release is finally here!

Just as everyone expected in the comment area before the live broadcast, the official version of the video-generated large model Sora has finally appeared!

In total, nearly 10 months have passed since Sora was released on February 16 this year.

Now, netizens can finally experience Sora’s powerful video generation capabilities!

At the same time, OpenAI has developed a new version of Sora - Sora Turbo, which is much faster than the preview model in February. The release will be available as a standalone product to ChatGPT Plus and Pro users today.

According to today’s live broadcast, Sora users can generate videos in 1080p resolution, up to 20 seconds long, widescreen, portrait or square. And users can use resources to extend, remix and fuse, or generate brand new content based on text. OpenAI has developed a new interface that makes it easier to prompt Sora with text, images, and video, while storyboarding tools allow users to precisely specify input for each frame.

We can first look at a few generated video examples:

Tip: The lens is foggy and the colors are contrasting. The captured feeling is low visibility lens quality, providing a A sense of immediacy and chaos. The scene shows shaky footage from the perspective of a sailor on a 17th century pirate ship. The horizon shook violently as waves crashed against the wooden hull, making it difficult to discern details. Suddenly, a huge sea monster suddenly appeared from the turbulent sea. Its huge, slippery tentacles stretched dangerously out, its slimy appendages wrapping around the ship with terrifying force. The view changes dramatically as the sailors scramble in panic to confront the terrifying sea creature. The atmosphere was tense and the groans of ships and the roar of the sea could be heard amid the chaos.

Tip: Rockefeller Center is full of golden retrievers! Everywhere you look, there's a golden retriever. It's New York's winter wonderland at night, complete with a giant Christmas tree. Taxis and other New York elements can be seen in the background

Sam Altman said that what excites him most is the ease of co-creating with other people, feeling like a Interesting new things. You can think of Sora as the video version of GPT-1.

OpenAI research scientist Noam Brown said that Sora is the most intuitive demonstration of the power of scale.

For Sora’s developmentSome netizens said that this is the best Christmas gift, while others said that Sora will be a game changer.

Bring your imagination to life through text, pictures or videos

With excitement, the heart of the machine also wants to try Sora! However, there are too many netizens who want to experience it, and they have been unable to log in:

Experience address: https://sora.com/onboarding

Let’s first show readers the officially released Sora’s abilities.

Use Remix to replace, delete, or reimagine elements in your video

Open the path to the book The main door of the museum

Replace the door with a French door

‍The scene outside the door is replaced with a moonscape

Re-cut: find and isolate the best frames, then extend them in either direction to complete the scene

Storyboard: Organize and edit unique sequences of videos on the timeline

Videos Top 114 The scene in the frame is "a vast red landscape, with a spaceship docked in the distance."

Then, you can add 114- 324 The scene of the frame changes to: "Looking out from inside the spacecraft, a space cowboy stands in the center of the screen."

Finally, the video content can be described as "Aerospace A close-up of a police officer’s eyes, framed by a mask made of knitted fabric. Loop: Use Loop to edit and create seamless, repeating videos.

< p>Blend: Merge two videos into one seamless clip

Style presets: Use "Presets" to create and share styles that inspire your imagination

More stunning videos generated by Sora also require the imagination of netizens to create.

Sora official version system card

In February this year, when Sora was first released, OpenAI published Sora’s technical report.

OpenAI believes that extending video generation models is a promising path to building general-purpose simulators of the physical world.

With the official release of Sora today, OpenAI also released Sora’s System Card), interested developers can delve into the technical details.

Address: https://openai.com/index/sora-system-card/

Sora is OpenAI's video generation model designed to take text, image, and video inputs and generate new videos as output. Users can create videos in various formats up to 1080p resolution (up to 20 seconds).

Sora is built on the DALL・E and GPT models and aims to provide people with tools for creative expression.

Sora is a diffusion model that generates a new video starting from a base video that looks like static noise, gradually transforming it by removing the noise in multiple steps. By feeding the model multiple frame predictions at once, Sora solves the challenging problem of ensuring that the subject of the frame remains intact even if it temporarily leaves the field of view. Similar to the GPT model, Sora uses a transformer architecture to unleash excellent scalability performance.

Sora uses recaptioning technology in DALL・E 3, which involves generating highly descriptive captions for visual training data. As a result, Sora is able to more faithfully follow the user's textual instructions in the generated video.

In addition to being able to generate videos based solely on textual instructions, the model is also able to take existing static images and generate videos from them, animating image content with accuracy and attention to detail. The model can also take existing video and expand it or fill in missing frames. Sora is the foundation for models that can understand and simulate the real world, and OpenAI believes Sora will be an important milestone on the road to AGI.

On the data side, as OpenAI described in its February technical report, Sora draws inspiration from large language models that gain generalist capabilities through training on Internet-scale data. LLM was able to establish a new paradigm, partly thanks to innovative ways of using tokens. The researchers have cleverly unified the multiple modalities of text—code, mathematics, and various natural languages.

In Sora, OpenAI considers how models that generate visual data can inherit the benefits of this approach. Large language models have text tokens, while Sora has visual patches. Previous research has proven that patches are effective representations of visual data models. OpenAI discovered that patches are scalable and efficient representations for training models that generate various types of videos and images.

At a high level, OpenAI first compresses the video toThe lower dimensional latent space then decomposes the representation into spatiotemporal patches, thus converting the video into patches.

Sora is trained on a variety of datasets, including publicly available data, proprietary data acquired through partners, and custom datasets developed in-house:

Publicly available data. The data is primarily collected from industry-standard machine learning datasets and web crawlers.

Proprietary data from data partners. OpenAI forms partnership to access non-public data. For example, Pond5 partnered with Shutterstock⁠ to build and deliver AI-generated images. OpenAI also commissions the creation of datasets tailored to its needs.

Artificial data. Feedback from AI trainers, red teamers, and employees.

For more details, readers can view the system card introduction.

Price Rights

Of course, with the official release of Sora, OpenAI also announced the usage price. It seems that it is not cheap:

The video generation benefits that ChatGPT Plus users can enjoy for US$20 per month include:

Up to 50 priority videos (1000 points)

Resolution up to 720p, 5 seconds duration

Video generation benefits available to ChatGPT Pro users for $200 a month include:

Max 500 priority videos (10,000 points)

Unlimited relaxed videos

Resolution up to 1080p, duration 20 seconds, can generate 5 concurrently

No downloads Watermark

After all, I have been looking forward to it for so long. Are you going to rush?

Online Consultation