In-depth test Manus I still think this is the DeepSeek moment of the AI Agent industry

Author: Lan Xi

Manus has been flooding the screen for a day, from the beginning of becoming famous overnight, to the difficulty of finding a code in the middle, to the spending of money on its promotion and promotion, during the whole process, FOMO's emotions and intuition are endless intertwined, which is a very interesting communication sample.

In fact, the AI industry has been a "explosion-driven" information model in recent years. Those who understand it have been disenchanted, but those who don't understand it will still be rare. But one thing is to say, if it explodes like this every day, objectively there will be real explosions and confusions.

And my evaluation of Manus is that it really belongs to the table that explodes, and it can be called the DeepSeek moment of the AI Agent industry, but there is a patch that I stacked on at the end.

Let it first see a demonstration effect of Manus:

Let it develop a text interactive game that can play the CEO of Google. By experiencing important decisions in the company's history, you can not only enjoy the game, but also understand the company's culture.

After almost an hour, Manus developed the web game of Google CEO Simulator, with a high degree of completion. Clicking to start the game will also make you choose the difficulty. Then you will face every transformation node in Google's development history. Your choice will determine the changes in company resources and affect the final game ending.

In an hour, make a game in one sentence, this is the ability of the AI Agent.

It is different from traditional conversational AI. It no longer just provides answers to the information layer, but can operate the computer to complete more specific work tasks, including but not limited to writing programs, making web pages, integrating reports, screening resumes, etc. It can completely solve various difficulties encountered in the process and deliver work results. Of course, there are exceptions, and we will talk about this exception later.

There are currently not many mainstream AI Agent services, and they are generally very expensive. For example, ChatGPT Operator requires US$200 per month Pro membership to use, and there is also AI engineering that focuses on the programming market.The monthly fee for Devin is $500.

Manus' developer is Monica, the big model team, which is currently in the free testing stage. The cost of a single task has been compressed to US$2, which is 1/10 of OpenAI. At the same time, it has surpassed OpenAI in the benchmark test rankings and won the world's strongest.

After getting the invitation code, I had exhausted Manus' single-day computing resources within a few hours. I was really excited and the effect was very shocking.

Show a few actual test cases:

First I asked it to help me make a linktree-style personal homepage. Manus split this task into 8 steps. First, I collected my information on the entire network, including my links and representative works on various platforms, and then started writing web code based on the design style of linktree. Half an hour later, it delivered such a work to me.

‍

Simple, but perfectly meets the requirements, and there is no problem with interaction. If you want to make it more beautiful, you can continue to write prompt words to make it modify.

The second test was that I used Manus to help an engineer group member solve practical problems. There was a small problem with the Atlas robot arm he was responsible for maintaining in the factory. It would cost several thousand yuan to find an after-sales service. It would be better to find a way to make up for it myself. He was too lazy to read the documents, so he gave me a message and asked Manus to see how to deal with it.

Note that, in theory, ordinary conversational AI can also catch this requirement, but it will require more interactive processes. For example, you have to feed the document to it and get the answer step by step, but Manus does not need these. It will download the document on its own on the Atlas official website, find the key content needed to solve the problem after reading it, analyze it carefully, and create the program. I sent the last code to a friend. It has a little flaw, but it is completely available after manual modification, which directly saves the number of after-sales calls.

The third test was proposed by my Weibo readers, let Manus do itIn a minimalist chronicle, I added the requirements for comic list selection and web design. The color matching of the works delivered in the end is a bit difficult to tighten - AI has no aesthetics, which must be repeatedly emphasized - but at this time, Manus' server was down and cannot be modified for the time being, so I will display the semi-finished product.

It can be seen that Manus divided the history of Britain into 10 different eras, and drew SVG pictures based on the style of the times, and finally presented them on the HTML web page. It can be said to be a model room for human-computer collaboration. Whether it is an extracurricular lesson plan or a work preview, there are extremely convenient thresholds to get started.

The last case was that I asked Manus to make a game of elimination, but the icon had to use the character of Genshin Impact. It first started to study the game mechanism and implementation method of elimination, and then tried to collect image materials of Genshin Impact. At this time, an exception appeared. It issued a takeover request for the first time, and the reason was speechless. Its operating logic was blocked by a network disk and it was impossible to register an account, so it could not download the resources, so it wanted me to help it download.

It seems that no matter how powerful the AI is, it will be blocked by members of the network disk.

In line with the principle of allowing AI Agent to complete the work independently as much as possible, I did not do this, but slightly changed the requirements and asked Manus to use the technology company's logo to make game icons. Because the open copyrighted SVG material is all on the Internet, Manus will run without any problems now, and soon finished a points-free game, which is smooth to play.

However, we can also see that when solving such relatively complex problems, Manus still has shortcomings in details, which is related to the fact that humans (I) have too little participation. For example, the adaptation problem of screens needs to be given more explanations. Manus' modification response is not slow, but because it also encountered the trouble of server downtime, this task has not continued to improve for the time being.

I think these test examples can clearly show that the AI Agent's capabilities and shortcomings at this stage are no longer the kind of product that can only operate browsers. It has a sandbox environment and can conduct tests on its own before completing the work, and then deliver it after passing the acceptance. However, it is also limited to the data boundaries of the Internet. If the resources on the network are notIt is enough, it has no way to produce resources that are self-sufficient.

I also did some paperwork tests, which can also be used to compare the characteristics of AI Agent:

For example, I asked Manus to give her operating skills based on the 10 most popular videos of B.com.

Manus really finished watching 10 videos—it took more than an hour—and then refined the small compositions of each UP master into the materials I wanted, and it was quite accurate. If the same task was handed over to the Internet-connected big model, it could be completed, but the probability of illusion was very high, which was not as reliable as the AI Agent in terms of "honestness".

For example, let Manus study the arbitrage possibility of PolyMarket. Although I do have a little expectation, I want to get a stable investment guide - don't laugh - Manus has done his homework diligently and listed four arbitrage opportunities, so that as long as I see projects that meet the conditions appear in PolyMarket, I can place bets according to the rules without thinking.

From the playback, Manus always starts with the most basic information, first understand what PolyMarket is, then analyzes the gameplay of the market, and then combines platform rules to build risk strategies, standard intern style, work hard and durable.

By the way, the replay design is also one of the highlights of Manus in my opinion. It is a bit like the choice of inference model to expose the thinking chain. Many times, the thinking process of AI is more inspiring than the supply of answers. Every task of Manus has a playback function and can be shared. The means it shows on the way to solve problems can be called another form of smart assets, which can play the role of human teachers.

So then again, I evaluate Manus as the DeepSeek moment in the AI Agent industry. A patch is needed here, which is the DeepSeek-V2 moment. In May 2024, the model of DeepSeek open source V2 version was the first time it became popular because the price is very cheap, but because the model itself has average capabilities, many people just thought DeepS at that timeEek is coming to fight a price war. I was surprised but didn't pay attention to it, and the popularity did not last long.

It was not until the continuous release of DeepSeek-V3 and R1 that everyone realized that things were completely different, and the cost logic of the entire big model market was overturned overnight.

Initially, no one cared about this disaster, it was just a wildfire, a drought, the extinction of a species, the disappearance of a city, until this disaster was closely related to everyone. ——"Wandering Earth"

I mean that the development of AI technology is continuous, and on this ups and downs, the signal strength of each time determines the depth of the breakthrough, just like DeepSeek without V2, there would be no V3, let alone R1. My view of Manus has not changed. In the historical turning point of bringing AI Agent services from professional scenarios to general scenarios, it is the founding brand of Kaishan Lipai.

From the use case point of view, as an AI Agent, it is very powerful and has a high proficiency in disassembly tasks. The observation of CoA (agent chain) is very similar to looking at CoT (thinking chain), and can "see" AI to evaluate and seek the optimal solution in multiple solutions.

In theory, it should have built-in massive CoA for undertaking. Inferior models like DeepSeek have digested enough CoT in advance before they can be introduced to the mass market, covering mainstream demands as much as possible, and can be seen from the Use Case on the official website.