News center > News > Headlines > Context
After comparing the generation effects of Sora and domestic video models, I was disenchanted with Sora.
Editor
2024-12-19 16:03 9,667

After comparing the generation effects of Sora and domestic video models, I was disenchanted with Sora.

Image source: Generated by Unbounded AI

Sora has been fully online for a week.

Even though Sora has eclipsed everyone for nearly a year, everyone still has high expectations for this product. The server crashed as soon as it went online, but everyone’s experience does not seem to be very good. In fact, The effect seems a bit unsatisfactory.

Many people complained that 20 US dollars was wasted, and the video effect generated was not as good as the domestic Keling and Jimeng.

Did Sora really get up early and catch a late gathering?

AI Big Model Factory finally spent huge sums of money to get memberships to see which one performs better between Sora and domestic models.

Actions speak louder than words, it’s better to take action! Open directly.

New highlights in the editing section

The biggest highlight of Sora this time is that it has introduced a variety of basic functions for text and pictures. Advanced editing functions, first let’s take a look at the video performances of these updated editing functions of Sora:

1. Remix (remix)

Users can use Remix Replace, delete, and reimagine elements in the video. We can see that doors and scenes in the video can be replaced or deleted:

2. Re-cut (re-edit)

Find and separate the best frames, extend them forward or backward to complete (New) Scenes,By picking appropriate video frames, we can re-expand,video scenes.

3. Storyboard (Storyboard)

Organize and edit unique sequences of videos on the timeline, and accurately control the development of video shots to tell new stories.

4. Loop

Use Loop to edit and create seamless looping videos.

5. Blend

Merge two videos into one seamless clip.

6. Style presets

Use presets to create and share styles that inspire your imagination. The video currently supports five styles, Balloon World, Stop Motion, Archive, Film Noir, Cardboard & Paper.

Sora VS Keling AI, Tencent Yuanbao, Jimeng AI

This part of the content is mainly evaluated from the perspective of Wensheng video, AI large model workUse the same prompt to generate videos in different video models to see how they perform.

1. Christmas Tango Scene

Prompt: Beautiful Christmas scene, a pair of tango dancers are dancing tango. Sora

Are the actions generated by sora serious? Let's be more elegant. Don't think that I haven't learned tango yet so I can fool people with this kind of action. . . .

Let’s take a look at the keywords generated by Keling under the same keywords.

Keling

The Christmas scenes and character dancing scenes generated by Keling are coordinated. At the same time, the character's limbs are not distorted or deformed when making large-scale movements, and even the dancing movements are coherent. In contrast, the AI ​​​​large model factory prefers scenes that can be generated intelligently.

Jimeng

It can be seen that Jimeng has tried his best to maintain elegance in his movements, but the details of the character's hands have not been handled well, and the lady's hands feel like they have nowhere to rest. .

Tencent Yuanbao

The scene generated by Tencent Yuanbao has a sense of Disney fantasy. Although the movements of the generated characters are small, you can see that the details of the fingers are not distorted. Generally speaking, the generated effects are full of atmosphere, but the character AI feels more obvious and lacks realism.

2. Romantic cherry blossom scene

Prompt: The beautiful spring Tokyo city is bustling with people. The camera moves through the bustling city streets, following several people enjoying the beautiful cherry blossom weather and shopping at nearby stalls. Gorgeous cherry blossom petals fluttering in the wind.

Sora

The semantic understanding of this part of Sora is obviously not in place, and the cherry blossoms are not seen all over the sky, and there is a lot of deformation during the movement of the characters. The two main characters of the video The girl's head was severely deformed. The head of the girl in front turned 180 degrees backwards, giving the impression of a horror movie. In addition, the clothes of the two girls were also very strange, giving the impression of "a child wearing an adult's clothes".

Keling

The cherry blossom scene in the sky generated by Keling is more dreamy and has a Japanese street style, but the petals are a bit big like roses. . .

Jimeng

Jimeng’s semantic understanding is also obviously insufficient. It also does not show the sense of cherry blossoms in the sky. It even is not accurate enough in facial recognition of characters in multiple scenes. Vague.

Tencent Yuanbao

In terms of semantic understanding, the cherry blossom scenes and characters generated by Tencent Yuanbao performed well, including the transitions of the shots, and there was no sense of dissonance. In terms of details, Yuanbao can be said to It is the only scene that shows several people shopping at a stall, and the camera switching is very natural.

3. Cat hunting scene

prompt: The cat is running in the residential area. What's incredible is that from the cat's perspective, there is grass underfoot and other cats are lying on it. It looks like it is aiming at birds.

Sora

The cat generated by Sora is a little blurry in front of the camera at the beginning. The running posture is relatively average. It can still be seen that the semantic understanding is not very comprehensive. Elements such as residential areas and birds are not reflected. .

Tencent Yuanbao

The Yuanbao part is also obviously not well understood. The cat’s appearance appears and disappears, and the camera angle is also very rough.

Ji Meng

In contrast, the semantic understanding of Ji Meng is very good, and the birds and other cats lying in the scene are accurately represented. You can even see that the cat's lens language is very strong.

Keling

Keling's scenes and shots are very comprehensive, and all elements in the text content are also included, including the expressions and limbs of cats in motion. Deformation occurs.

One thing to say, after comparison, I found that although Sora does have many functional highlights from a professional perspective, including the storyboard concept in editing, it can be clearly felt that the model capabilities are not as good as Not on. At present, domestic AI video products are being closely pursued, and major manufacturers are trying to compete with each other. We have seen too many good effects, and we can no longer go back to the era of being able to deal with them easily. In addition to model capabilities, users also care about interaction, threshold, cost-effectiveness, etc. Sora currently does not have enough advantages no matter what angle it is compared from.

Sora is different in terms of algorithms, but I haven’t felt where Sora’s ceiling is yet. The officially generated pictures of the finished product are indeed stunning, but there are still many barriers for users in actual operations, at least at the threshold, which hinder many new AI users. Keywords and function panels require more complex and precise operations, and the semantic understanding is not precise enough.

Keywords: Bitcoin
Share to: