News center > News > Headlines > Context
Domestic AI video first-tier competition! Can the spirits make Sora roll over?
Editor
2024-12-25 12:03 1,117

Domestic AI video first-tier competition! Can the spirits make Sora roll over?

Image source: Generated by Unbounded AI

After waiting for 10 months to be released, Sora’s actual test performance was disappointing. Google’s Veo is amazing to everyone, but the queue is far away and I don’t know when I can play it. Let’s take a look at the domestic AI video models that can be used by everyone. Recently, Keling's video generation model has been updated to version 1.6. At the same time, the price is not increased when the volume is increased, and the points for generating videos remain unchanged. The most intuitive thing about Keling’s update is that Tusheng Video has become stronger. Then, let’s have a Tusheng video competition, and ask questions from the perspectives of character performance, physical laws, multi-subjects, etc. Let Ke Ling compete with Conch and Ji Meng on the same question to see who is far ahead.

Bite pizza, eat noodles, drink wine, who has the least AI taste first needs to explain that this is a self-entertainment competition without advertising, and each Tusheng video is only generated once per model. In order to ensure a fair competition, the models chosen by Conch and Jimeng are also the strongest versions of each. The specific models are as follows. Keling: 1.6 model. Conch: I2V-01 or I2V-01-Live model, the latter has stronger character performance. For each generation, the video with better performance among the two models is selected. Just a dream: P2.0 Pro model. After testing, the P2.0 Pro model of Meng performed significantly better than the S2.0 Pro model for the same prompt word. Probably since AI Will Smith ate noodles, we have liked to let AI taste various delicacies and use AI's eating habits to judge the degree of technological progress. First, let Joey from "Friends" eat a pizza and enter the prompt word, "The man sat on the sofa, brought the pizza to his mouth, took a bite, and the camera slightly zoomed in to focus on the action of eating the pizza." Ke Ling allowed Joey to successfully bite the pizza. The pizza was missing a bite, and even the pizza was stringy when it reached his mouth, but his facial muscles were a little too strong.

Ke Ling’s performance of transforming into a conch was also very complete. He took a big bite of pizza, but the pizza was a little awkward the moment it entered his mouth.

The conch is born as a dream and followed the prompt words to zoom in on the camera, but did not eat the pizza.

Come and watch three domestic AIs eating pasta in a dream. The picture is taken from the movie "The Killing of a Sacred Deer". The prompt is "The man lowers his head to roll up the noodles with a fork and eats them in big mouthfuls." Ke Ling and Ji Meng pass the test, and Conch's noodles have their own ideas.

Keling Generation

Conch Generation

That is, Mengsheng eats too much dry pasta and asks the AI ​​to challenge him to drink. This time, the heroine of the Japanese drama "I, Get Off Work At Time" is invited. The prompt is "The woman puts down her right hand covering her face, picks up the beer glass and takes a sip, narrows her eyes slightly, and shows a satisfied smile." This prompt includes changes in the character's movements and expressions. Ke Ling's performance is perfect. When sipping the wine, the liquid tilts naturally and foam floats around the mouth. The protagonist even blinks unconsciously while drinking.

The first half of the spirit-generating conch was completed well, but the protagonist has to take care of himselfHe looked at the camera and saw that the prompt did not contain this paragraph.

As for the birth of the conch, the action of putting down the hand is quite natural. However, when drinking, if the glass is raised too high, it feels like the wine will spill out in the next second.

Ji Meng is born cutting tomatoes and doing gymnastics. Why are the laws of physics so difficult to learn? Now, let Ke Ling, Conch, and Ji Meng come down from the dining table to challenge the things that nature takes for granted but makes AIs difficult. Recently, the comparison of tomato slicing between OpenAI Sora and Google Veo has become another basis for people to ridicule Sora. Now, let three domestic AIs return to the kitchen.

Upload a static picture of cutting tomatoes. My prompt words are as follows: "Realistic style, close-up, the chef is cutting tomatoes on the chopping board, his hand movements are smooth, the tomatoes are cut into even slices, and the juice is slightly Splashing, smooth dynamic effect. "The tomatoes can be cut evenly in thickness, and the tomatoes will deform slightly when the blade is pressed down and dropped on the chopping board.

The conch’s hand is very steady, but is it cutting tomatoes? More like sawing wood.

Conch Sheng Ji Meng can cut it well. It feels like cutting iron like mud. Either the tomato is too soft or the knife is too sharp.

Let’s take a look at how AI understands running and jumping after dream generation. I first used AI to generate an ink-style picture of a deer seen in the depths of the forest, and entered the prompt word "The sika deer ran a few steps to the pond, elegantly Jumps across the water and disappears on the left side of the screen." The clever deer trotted a few steps, but failed to jump across the water.

The deer that can be transformed into a conch has the best jumping ability and jumps out of the screen neatly, which is the feeling that best meets my needs.

The conch generated the dream deer, like Ke Ling, did not jump up. This made me realize that there may be ambiguity in the prompt words, and the AI ​​​​understood it in different ways. "Jump over the water" is not certain. It can be jumping from one end to the other, or it can be "jumping across the water."

Jimeng’s gymnastics is known as the “Turing test” in the AI ​​video industry. Because it is too difficult and the body movements are too complex, it is easy for AI to produce inaccurate or even terrifying images. . I tried it with a screenshot of a gymnast, and sure enough, it did. My prompt word was written simply, "A female gymnast performs difficult moves on the balance beam." I wanted the AI ​​to be able to perform freely, but the result was hard to describe. Ke Ling's legs, hands, and neck are all indescribably weird, and it's hard to understand even after watching it 10 times. At the same time, when the balance beam is under pressure, the deformation caused by Ke Ling is also exaggerated, which is too much. The balance beam wasn't this unstable during the gymnast's real-life competition.

Conch and dream also have their own abstractions, which are difficult to evaluate in human language.

Conch Generation

Immediately, Dream Generation defeated the young actors and set the benchmark for veteran actors. If AI short dramas and AI movies become popular in the future, then the acting skills must be better than those of the young actors in domestic entertainment. good. So, let’s test the acting skills of domestic AI. Take out the movie "Sid"Screenshot of the famous scene of "Sisters Gang", enter the prompt word "The woman had a cigarette in the corner of her mouth. She smiled, then raised her right hand, took out a metal lighter from her coat pocket, opened the lighter cover to light a fire, and put the flame close to the end of the cigarette." , let the AI ​​smoke a cigarette. Ke Ling's performance is so detailed, every expression and movement follows the prompt words. When taking out a lighter, the protagonist lowers his head. When lighting a fire, the protagonist also looks at the cigarette. His acting skills are natural.

Ke Ling’s creation of the conch is also very complete, and the expression follows the movements, but the lighter is lit first, and then the left hand is in a lighting position.

The conch is born as a dream and follows the prompt words equally accurately. The protagonist smiles, raises his hand, and opens the lighter cover, step by step, without missing one important point, which is that he is a little confused when lighting the fire.

A game CG-style chivalrous girl generated by Dream and then AI-generated, testing the micro-expressions in close-up. I carefully designed the levels of expressions and entered the prompt words "The girl first showed an expression of shock, then became angry, her eyes became sharp, and finally raised the corners of her mouth, showing a murderous sneer." Let's see if the AI ​​can perform it. . But the spiritual performance showed surprise, anger and sneer were not very obvious, at least the expression was lively, and the hair was flowing just right.

The performance of "Ke Ling becomes a conch" cannot be said to be wrong, but it is too exaggerated. It is an acting skill that has been practiced for two and a half years. It felt like she was scolding people, and she was scolding harshly.

Conch Sheng Ji Meng’s acting this time is the best among the three AIs. The shock and sneer are conveyed particularly well. It’s time to clean up and make a debut. Internal entertainment needs you.

That is, when dreams generate complex prompt words, who has the strongest reading comprehension ability? Complex scenes with multiple subjects and multiple actions are also a difficult problem for AI. Based on the famous boxing ring scene in the movie "One Hundred Dollars of Love", can AI generate a wonderful match? I entered the prompt word for the Tusheng video - "Two female boxers are fighting in the ring. The boxer in red shorts quickly throws a left hook. The boxer in blue shorts retreats to dodge and immediately counterattacks with a straight punch. Both parties are agile. powerful". Ke Ling fought back and forth, and had good semantic understanding. He knew who punched first and which hand punched first.

Ke Lingsheng’s conch punch was correct, but the scene was a bit chaotic at one point, and his hands were left as afterimages. Ji Meng's fighting and dodging were very natural, but the punches did not follow the prompts.

Conch Generation

It may be a bit difficult for two people to fight in a dream. Let's try the talent show of each person. I chose a still from "Deadpool and Wolverine" and added the prompt "Six superheroes maintained their formation, raised their hands simultaneously to make a heart-shaped gesture, and then turned around in unison." Ke Ling's heart and mind were not in sync, and the video ended before the turn was over.

Ke Ling was born into a conch in a neat and uniform manner, but this turning method was something I didn’t expect. Let’s just say it didn’t turn.

The performance of the conch being born as a dream is the best, the heart is synchronized, and it is easy to turn around.Be neat.

Finally, let’s play a creative video. Recently, the AI ​​giant game is very popular. Referring to the creativity of AI blogger Hyacinth, I first used AI to generate pictures to make the giant Doraemon and the Sphinx stand side by side. Then write the prompt words "The big blue cat made a dorayaki with its right hand and handed it to the stone statue next to it. The stone statue opened its mouth and ate it." Let the picture move. This time, the dream is the best, the conch is the second best, and the most abstract one is Ke Ling.

Keling Generation

Conch Generation

After the evaluation of Dream Generation, the three domestic AIs have their own winners and losers in a limited number of questions, but their overall performance is the same. It's not bad, the playability is quite high, and it's worth experiencing in depth. It's fair to say it alone, but within the scope of the question, the 1.6 model has a strong overall strength, the movement is relatively reasonable, and the prompt words and physical laws are followed well. The action and reaction of the force are relatively realistic, but sometimes the beauty of the picture is poor. Point, difficult sports also need to be learned.

Ke Ling generates, based on movie stills, to make horses run. Although Ke Ling cannot be called pointing where to hit, its pictures and videos have a deeper understanding of the prompt words, even if the results are not entirely correct. , or the picture is not beautiful enough, at least you can see that many details of the prompt words are reflected in the video. In short, it feels like the points you bought are not in vain, the pictures are not in vain, and the prompt words are not in vain. From 1.5 to 1.6, which seems to be only a decimal point, the controllability of AI video has been improved visibly to the naked eye. The competition for AI videos has heated up, but more importantly, when generating videos, the urge to flip the table is reduced and the desire to create is stronger. It is worth looking forward to what surprises domestic AI videos can bring in the future.

Keywords: Bitcoin
Share to: