At the end of 2023, Stanford University released a sensational AI experimental project-"Town Simulation Game". In this virtual town, 25 AI characters can talk independently, build relationships, and make plans, showing amazing social skills. This experiment made people look forward to AI Agents for the first time - AI assistants with autonomous consciousness and decision-making capabilities are just around the corner.
One year has passed, and the concept of AI Agent is hot in the industry. Technology giants such as Microsoft and Google have laid out their plans, and startups are also rushing to launch various "Agent" products. However, when we carefully observe these products called "Agents", we will find an embarrassing reality: they are far from real agents, and more like conversational robots that only have natural language understanding capabilities.
This phenomenon of "similar in appearance but not similar in spirit" is also constantly happening in the field of AI hardware. In October 2024, the smart ring brand Oura launched the latest Oura Ring 4, and added AI functions in a "currently aware" manner. Soon, Oura's valuation exceeded US$5 billion, becoming one of the most commercially successful "AI hardware" manufacturers. However, there is a consensus that Oura's success has little to do with AI. Its core value still lies in the basic function of health tracking. In contrast, hardware products that truly focus on AI, such as AI Pin and Rabbit R1, have suffered the fate of "turning over as soon as they were launched".
What is called an AI Agent? I casually opened a large model APP and saw the Prompt Agent? Or a professional Agent Cursor in programming? Or is it Iron Man’s all-around assistant Jarvis?
Jon Turow, a partner at American VC Madrona, once pointed out: When you talk to enough practitioners, you will find that there are a series of different concepts, and they are all called agents.
If AI Agent is described as a marathon, where will AI Agent go in 2025?
1. AI Agent Observation in 2024: Half sea water, half flameLively scene: Players from all walks of life are in placeIn the first half of 2024, the price war for large models is still going on vigorously; in the second half of the year, the battle for AI Agents is already gaining momentum.
In overseas markets, technology giants such as OpenAI, Anthropic, Microsoft, and Google have announced relevant progress and will use their own AgEnt strength is regarded as an important bargaining chip on the card table.
In October, Anthropic launched an AI Agent system called "Computer Use", claiming to be able to "operate a computer like a human." This is a special API that allows developers to guide Claude to complete various computer operation tasks - observing screen content, moving the mouse, clicking buttons, typing, etc. The API allows developers to automate tasks by converting written instructions into specific computer instructions.
(Picture: Anthropic developer demonstrating Computer use)
Microsoft is also an important promoter of AI Agent. In October 2024, Microsoft announced an important plan: to develop and deploy 10 AI Agents for the Dynamics 365 business application platform - which will mainly serve key areas such as corporate sales, accounting business, and customer service. According to the schedule, these AI Agents will be open for public testing at the end of the year, and the testing phase is expected to last until early 2025.
(Picture: Microsoft CEO showing Copilot and AI stack)
Google’s response was relatively slow, but it caught up with the progress at the end of the year. In December, Google released Gemini 2.0, a new multi-modal large model. With the support of the new model, Google has built-in three AI Agents - "Universal Large Model Assistant" Project Astra, "Browser Assistant" Project Mariner and "Programming Assistant" Jules.
The "programming assistant" Jules is able to integrate directly into GitHub's workflow system as an autonomous agent, analyze complex code bases, implement fixes across multiple files, and prepare detailed pull requests without the need for ongoing Manual supervision; in the demo of the game "Clash of Clans", Google AI Agent can not only introduce the characteristics of soldiers to players and give combination suggestions, but can also retrieve information on Reddit to provide players with character selection suggestions.
(Picture: Players interacting with Google AI Agent)
Although OpenAI is the leader in basic models, its deployment of Agents is slightly slow. In July, OpenAI updated its AGI roadmap and pointed out that it is at the first level and is close to reaching the second level; the third level is the AI Agent.
(Picture: 5 stages of artificial intelligence development defined by OpenAI)
OpenAI is expected to launch a new AI Agent-Operator in January 2025. This system can automatically perform various complex tasks. Operations, including writing code, booking travel, automating e-commerce shopping, etc. It is reported that Operator may significantly innovate and simplify applications based on Computer use, and expand AIAgent’s usage scope and application scenarios.
In the domestic market, major manufacturers such as Baidu, Alibaba, Tencent, and Zhipu have also entered the market.
On the B side, Baidu Wenxin Intelligent Platform, Tencent Yuanqi, iFlytek Spark Intelligent Creation Center, Tongyi Intelligent, Byte Button, etc. provide intelligent agent creation platforms for enterprise users, and Started adding AI Agent entrance to its AI intelligent assistant interface.
On the consumer side, Alipay’s AI Apps Zhi Xiaobao and Zhipu AutoGLM have ignited the passion of consumer users. According to the demonstration, AutoGLM can browse and understand screen information, make task planning, and simulate the execution of commonly used operations on mobile phones - just by receiving simple text/voice commands, it can simulate human operations on mobile phones and likes in Moments. , order takeout from Meituan, book a hotel from Ctrip, etc.
Sober reality: When we talk about AI Agent, what exactly are we talking about?If you only see the above-mentioned lively scene, you will probably conclude that 2024 is the year of AI Agent.
However, there are actually very few AI Agents that users can truly rely on.
Just take 3 seconds to think - which AI Agents do you like to use? If you are a programmer, the answer may just be Cursor. If we change the question - which large AI models do you like to use? The answers will be varied, such as ChatGPT, Gemini, Claude, Kimi, etc.
At least from a practical perspective, the currently popular AI Agent is still a "virtual fire".
The main reasons are "unreliable" and "tasteless". AI Agent relies on the LLM "black box", which is inherently unpredictable, and the workflow requires connecting multiple AI steps, which will exacerbate these problems, especially for tasks that require precise output. It is difficult for users to ensure that the Agent can always provide accurate and contextual responses.
The State of AI Agents released by LangChain can be used as an important reference. More than 1,300 respondents involved in its survey pointed out that performance quality (41%) is the primary concern, far more important than factors such as cost (18.4%) and safety (18.4%). Even for small businesses that have always paid special attention to costs, 45.8% of them listed performance and quality as their main concern, and the cost factor was only 22.4%. At the same time, the report pointed out that the main challenges of adopting AI Agents in production include: It is difficult for developers to explain the functions and behaviors of AI Agents to teams and stakeholders.
In addition, although the base LLMs that AI Agent relies on perform well in terms of Tool use, they are not fast and costly, especially when loops and automatic retries are required. WebArena rankings show the performance of LLM agents in real-life tasksPerformance was benchmarked. The results show that even the best-performing model, SteP, has a success rate of only 35.8%, while GPT-4 has a success rate of only 14.9%.
So, can AI Agents on the market that cannot “completely take care of themselves” be considered agents?
If we follow Ng Enda's idea, it will be easy to understand - AI Agent can be hierarchical. He proposed Agentic System and believed that the adjective "Agentic" can better help us understand the nature of this type of intelligent agent than the noun "Agent". Like self-driving cars L1-L4, the evolution of Agent is also a process.
BabyAGI founder Yohei Nakajima’s classification of AI Agents is also worthy of reference.
1. Hand-made Agent: a chain composed of prompts and API calls, which has a certain degree of autonomy, but has many constraints.
Features: Assembly line robots complete tasks according to fixed steps.
For example: It is like a special ticket booking assistant - when you tell the flight requirements, it can directly call the API to search and complete the booking; however, once complex itinerary planning is involved, the hand-made Agent will " Stuck” (everyone is welcome to substitute the product).
2. Professional Agent: Dynamically decides what to do within a set of task types and tools, with fewer constraints than hand-made Agents.
Characteristics: A skilled craftsman who can skillfully use tools in a specific field (such as carpentry). Not only can he make furniture as required, but he can also adjust the design and use materials according to actual needs.
Example: AutoGPT uses CoT technology to decompose complex problems and dynamically select the optimal solution path. Faced with a market research task, AutoGPT can automatically decompose the task into sub-tasks such as "search trends", "organize data" and "generate reports" and complete them.
3. Universal Agent: Agent’s AGI is still in the theoretical concept stage and has not yet been implemented.
Features: All-round assistant, just like Jarvis from Iron Man. You can ask it any questions, and it will not only understand your needs, but also dynamically adapt based on knowledge and the environment to provide innovative solutions.
For example: There is no product that can actually be realized yet. Related research includes stronger multi-modal interaction and long-term memory optimization.
At the current historical juncture, Prompt Agent has the largest number, which is represented by Agents everywhere in large-scale APPs; professional Agents in vertical fields are at an explosion point, and are favored by capital because of their practicality; The real agent we are looking forward to, the all-round assistant Jarvis, awaits breakthroughs in key technologies. This also means that we can see more technological evolution between "L1-L4" in the future.
This oneWhere has AI Agent’s “subcutaneous” technology evolved in 2018?According to the formula listed by Lilian Weng: Agent = LLM+Memory+Planning skills+Tool use
Suppose you are one of the "Five Tigers" in the dark cooking world. LLM represents your knowledge reserve, including all cuisine recipes; Memory is like your chef's notes, recording the taste needs of different diners and the historical lessons lost to "little masters"; Planning is like your cooking plan, facing different requirements , whether to fry first and then bake, or boil first and then fry; Tools are your magic kitchen tools, including how to call different knives (software) to help perform complex tasks.
The breakthrough of AI Agent depends on the advancement of various technologies.
The first is LLM. Before the emergence of powerful "brains" like GPT5, OpenAI discovered the capabilities of the inference engine.
In October 2024, Noam Brown, senior research scientist at OpenAI and the father of Poker AI, proposed: The performance improvement brought by letting the AI model think for 20 seconds is equivalent to expanding the model 100,000 times and training 100,000 times time.
The technology Brown refers to is System 1/2 thinking, which is the secret of OpenAI o1’s “reasoning ability”.
System 1, which is "thinking fast", when you see an apple, you know it is a fruit without thinking; System 2, which is "thinking slowly", you have to make a 17*24 For math questions, you need to break down the steps and think about it to get a more accurate answer.
Recently, Google DeepMind researchers have also integrated this technology into AI Agent and developed the Talker-Reasoner framework. System 1 is the "fast mode" that runs by default, while System 2 is on standby as the "backup engine". When System 1 gets confused, it hands off the task to System 2. The "dual engines" running together are of great help in solving complex and lengthy tasks, breaking through the traditional AI Agent method of executing business processes and greatly improving efficiency.
The second is the memory mechanism. When generative AI starts "gibbering", it may not be a performance problem, but poor memory. At this time, RAG (Retrieval Enhanced Generation) is needed to help. It exists like a "plug-in" for LLM, and can use external knowledge bases to provide relevant context for LLM, preventing LLM from pretending to understand.
However, the traditional RAG process only considers one external knowledge source and cannot call external tools; it only generates one-time solutions, the context is only retrieved once, and reasoning or verification cannot be performed.
In this case, RAG, which integrates Agent capabilities, emerged as the times require. Although AgentThe overall process of ic RAG is the same as that of traditional RAG: retrieval-synthetic context-generation, but it incorporates Agent's autonomous planning capabilities and can adapt to more complex RAG query tasks - decide whether retrieval is needed; independently decide which retrieval engine to use and plan independently Steps to use a search engine; evaluate the retrieved context and decide whether to re-search; plan for yourself whether you need to use external tools.
If the original RAG is sitting in the library and looking at specific questions; then Agentic RAG is like holding an iPhone and calling Google browser, email, etc. to search for questions.
In addition, the open source Mem0 project incubated by YC in 2024 is also expected to become a RAG assistant and give AI Agent the wings of personalized memory.
Mem0 is like the "hippocampus" of the brain, providing an intelligent, self-optimizing memory layer for LLM. It can perform hierarchical storage of information - converting short-term information into long-term memory. Similarly, you will organize "newly learned knowledge" and store it in your mind; it can also establish semantic links - create an association network for the stored knowledge through semantic analysis. Similar to if you tell the AI that you like watching detective movies, it will not only remember it, but also guess what crime documentaries you might like.
Based on this, Mem0 can significantly improve AI Agent's personalized memory - dynamically record user preferences, behaviors and needs, and create a "private notepad". For example, when you tell the AI Agent that it's your mother's birthday next week, it will not only remind you to send you blessings in time, but also give gift-giving suggestions based on the preferences of you and your mother in "memory", and can even "shopping around" across platforms. , provide shopping link.
The breakthroughs in RAG don’t stop there. A team of scientists from Ohio State University and Stanford University came up with an interesting idea: giving AI a “memory brain” similar to the human hippocampus. From a neuroscience perspective, they imitated the role of the human hippocampus in long-term memory and designed a model called HippoRAG to integrate and search knowledge as efficiently as the human brain. Experiments have shown that the "memory brain" can achieve significant improvements in tasks that require knowledge integration, such as multi-hop question and answer. Perhaps explore a new direction to give large models a "human-like" memory.
The progress of Tool use can be seen with the naked eye. For example, Claude's Computer Use builds APIs to convert natural language prompts into various computer operation instructions, allowing developers to automate repetitive tasks, conduct testing and quality assurance, and conduct open research. From then on, AI can call various software to complete various operations "at once" without the need for specialized API "keys": writing documents in Word, processing tables in Excel, and searching for information in browsers. Even so, the current Computer Use capabilities are not yet complete: the function cannot be trained on internal data;Limited by context window and so on. The Anthropic team also stated that Claude's computer usage level is only in the early stages of the "GPT-3 era" and there is still much room for improvement in the future.
It is worth noting that the visual capabilities of AI Agents have also made progress. For example, the GLM-PC released by Zhipu applies its general vision-operation model CogAgent to the computer. It can simulate human visual perception to obtain information input from the environment for further reasoning and decision-making.
Planning capabilities. Planning includes task decomposition - dividing large tasks into small tasks; reflection and refinement - self-reflection based on existing actions, learning from mistakes and optimizing next actions.
At present, some papers propose more novel classification methods: task decomposition, multi-plan selection, external module-assisted planning, reflection and refinement, and memory enhancement planning. Among them, multi-plan selection means giving the AI Agent a "selection wheel" to generate multiple plans and pick the best one for execution; external module assisted planning means using an external planner, which is similar to the judge of reinforcement learning. Memory enhancement planning is like a memory bread that remembers past experiences to help with future planning. These methods are not isolated, but intertwined with each other to jointly improve the AI Agent’s planning capabilities.
Over the past year, Agent has made progress in various "subcutaneous" abilities, among which the Tool use ability has been initially implemented; the progress of the memory mechanism is very worth looking forward to; the progress of LLMs depends on the capabilities of the giants, etc. wait. But for Agent, maximizing its capabilities is not a simple addition of various technologies. A breakthrough in any technology is expected to usher in a qualitative change.
In the future, important challenges in the evolution of AI Agents include but are not limited to: how to achieve low latency and real-time feedback with visual understanding; how to build a personalized memory system; how to be robust in both virtual and physical environments execution capabilities, etc. Only when the AI Agent changes from "tool" to "tool user" will the real Killer Agent appear.
2. The choice of capital - when the big model is in the cold, AI Agent should be ready p>Some people say that the large model cannot be rolled now. If you want to roll it, roll the AI Agent.
In 2024, the large model companies that once strived to "become China's OpenAI" had no choice but to break their promises and use the "Six Little Tigers" Zhipu AI, Zero One Thing, Baichuan Intelligence, MiniMax, Dark Side of the Moon and Jie Taking Yue Xingchen as an example, most companies have begun to adjust their business and even reduce their personnel. With their strong financial background, large manufacturers can continue to engage in R&D; more start-ups are forced to face reality and turn to the application level of large models, seeking lower costs and faster development.Return.
At the same time, keen capital has also set its sights on the AI application layer.
Juzi IT data shows that in the first nine months of 2024, 317 financing cases occurred in the domestic AI field, with an average monthly financing amount of 4.2 billion yuan, less than 20% of last year. Among them, the five companies with the most financing took away more than 21.2 billion, equivalent to 63% of the total domestic AI financing this year.
It is worth noting that large model and AI Agent projects have attracted the most attention from investors - 19 financing cases occurred for large models and 18 for AI Agent. Followed by AI video generation (10%), the remaining 50% of investment cases are relatively dispersed and divided among 19 directions.
Thus, in the "winner takes all" situation of large models, AI Agent is not only the best direction for AI startups, but also the definite choice for domestic and foreign capital.
YC partner and senior investor Jared pointed out that as an emerging B2B software in the vertical field, AI Agent is expected to become an emerging market 10 times larger than SaaS. With the significant advantages of replacing manual operations and improving efficiency, this field may give birth to technology giants with a market value of more than 300 billion US dollars.
What do the AI Agents that investors are interested in look like?
The most popular one is the AI programming artifact Cursor. The reason is that code is the easiest ability for LLMs to master. The training data generated mainly comes from open source code on GitHub, and most of it is "valid data." Previously, Cursor provided suggested codes based on user needs. Nowadays, Cursor can directly help you create code files and prepare a running environment in one go for the purpose of realizing your requirements. You just click the launch button to run the code.
In addition, even if there is no real Killer Agent in 2024, in fact, Agents are already blooming everywhere in the segmented fields.
According to the latest sharing from the YC team. Most of the Agent projects that have received investment so far are in the toB field.
Questionnaire survey and analysis: Outset applies AI Agent to the field of questionnaire survey and analysis, which can replace traditional manual survey and analysis work, such as services provided by companies such as Qualtrics.
Software quality testing: Mtic uses AI Agent to conduct software quality testing, which can completely replace the traditional QA testing team. Unlike previous QA software as a service companies (such as Rainforest QA), Mtic not only improves the efficiency of the QA team, but also completely replaces manual testing.
Government contract bidding: Sweet Spot uses AI Agent to automatically search and fill in bid documents for government contracts, which can replace manual labor to complete these tedious tasks.
CustomerCustomer support: Powerhelp uses AI Agent to automatically answer calls, reply to emails and solve problems manually, and can provide personalized solutions based on user questions and historical records to improve their satisfaction.
Talent recruitment: Priora and Nico use AI Agent for technical screening and preliminary recruitment, which can replace manual labor to complete these tasks.
To sum up what Andrew Ng said: The road to AGI feels more like a journey than a destination. But I think agent workflow can help us take a small step forward on this very long journey. In other words, even if we cannot have an "all-round agent" for the time being, the gradual emergence of professional agents in multiple vertical fields will allow us to continue to have an experience similar to having Jarvis.
3. 2025: It is expected to be the first year for the commercialization of AI Agent< p> Recently, Ilya Sutskever, the former OpenAI co-founder and SSI founder, directly announced: Pre-training will be completely terminated from now on - we only have one Internet, and the massive data needed to train the model will soon dry up, and we can only find new breakthroughs from existing data. , AI will continue to develop.Sutskever used an analogy with the development of the human brain: just as the size of the human brain stops growing, human intelligence is still progressing. The future development of AI will shift to building AI Agents and tools on existing LLM. He predicts that future breakthroughs will lie in agents, synthetic data and inference-time calculations. Among them, AI Agents that can complete tasks autonomously are the future development direction.
It is worth noting that, like Ng Enda, Sutskever also uses the "adjective" Agentic to describe the agent.
According to the linear capital Bolt perspective: We can use a small, moderate, and high degree of Agentic "capabilities" to describe the capabilities of Agent applications. For example, Router-type systems use LLMs to route inputs to specific downstream workflows and have a small amount of agentic capabilities; State Machine-type systems use multiple LLMs to perform multiple routing steps and have the ability to determine each routing step. Whether a step is continued or completed, it has considerable Agentic capabilities; and Autonomous (self-agent) type systems go one step further and can use tools or even create suitable tools to promote further decision-making of the system, and have complete Agentic capabilities.
Based on this, before manufacturers emphasize the Agent attribute of their products, they may wish to answer "How agentic my system is?"
Currently notProfessional AI Agents in a few fields are still not mature enough. Relevant investigations show that problems such as inaccurate output, unsatisfactory performance, and user distrust plagued its implementation. But if we change our thinking: the AI Agent that will be most commercially successful in the short term is not necessarily the product that appears to be the most “Agentic”; it is the product that can balance performance, reliability, and user trust.
Following this line of thinking, the most promising development path for professional AI Agents may be: the first focus should be on using AI to enhance existing tools, rather than providing a wide range of fully autonomous independent services.
Use human-machine collaboration to involve humans in supervising and handling edge cases. Set realistic expectations based on current abilities and limitations. By combining tightly constrained LLMs, good assessment data, human-machine collaborative supervision and traditional engineering methods, reliable and good results are achieved in complex tasks such as automation.
For example, Rocks, a company in Sequoia’s portfolio, integrates human employees into its agents. Initially, Rocks developed a technology that automatically composes and automatically sends emails. But they found that when human sales were included in the process, performance improved 333 times. As a result, Rocks removed the automatic sending function.
Depending on specific business scenarios, some companies can develop technology for agents to complete tasks, such as Expo in the field of network security; while some companies try to use agents to "augment" human employees, such as Rocks.
So, what will happen in 2025?
First of all, not only programming, but also "seed players" will appear in more vertical fields. Sequoia partner Konstantine Buhler predicts that “high service cost” areas such as medical care and education will become the next important battlefield for AI technology.
At the same time, according to the LangChain report: people hope to hand over time-consuming tasks to AI Agents - to act as a "knowledge filter": to quickly refine key information, and users do not need to manually sift through massive data; "productivity accelerator" : Assist users to arrange schedules and manage tasks, allowing humans to focus on more important work; "Customer Service God Assist": Help companies handle customer inquiries and solve problems faster, greatly improving the team's response speed.
In other words, all time-consuming, labor-consuming, and cost-consuming work is expected to be replaced by professional AI Agents in vertical fields.
Secondly, AI Agent deployment will change from "single" to "multiple". On the one hand, AI Agent will develop from a single agent to a "group collaboration" model. In 2025, more multi-agent models will appear, with multiple agents playing different roles and cooperating to complete tasks. For example, Tsinghua University’s open source project ChatDev. Each Agent is given a different identity. Some are CEOs, some are product managers, and some are programmers.They can cooperate with each other and complete tasks together.
On the other hand, with the rapid improvement of large models’ ability to process image and video information, more comprehensive multi-modal interactions will begin to appear in 2025. AI can use the Internet of Things and specific information and other sensory channels for collaboration. Multi-modal input and output make AI more interactive, more frequent, and more applicable to scenarios, significantly improving the overall level of AI products.
Among them, Agent is an intelligent agent that integrates perception, analysis, decision-making and execution capabilities. Its interactive initiative and automation far exceed existing tools.
According to the observation of Qubit Think Tank: From the perspective of the development of both technology and supporting facilities, AI Agent will be widely used starting from 2025. AI Agent is expected to bring interactive methods, product forms and business models that are unique to the AI 2.0 era.
ConclusionIn the movie "2001 At the beginning of ": A Space Odyssey", a group of herbivorous apes are struggling on the edge of starvation and death. The leader of the apes accidentally waves the club bone in his hand and "discovers" that it is actually a handy tool. From then on, they began to hunt small animals, became carnivores, and gradually climbed to the top of the food chain.
If future humans look down on 2025, they may find that this is another critical moment in human evolution, and AI Agent is the "stick bone" that takes advantage of it.
As Andrej Karpathy said, AI Agent represents a crazy future.
Interestingly, the word Agent comes from the Latin Agere, which means "to do".
How to seize this crazy future? You probably just need "Agent".