AI is Crypto’s agent——The evolution of AI Agent

Source: Zuoye Waibo Mountain

A work of art is never completed, only abandoned.

Everyone They are all talking about AI Agent, but they are not talking about the same thing. This leads to the difference between the AI Agent we care about and the public perspective, as well as the perspective of AI practitioners.

A long time ago, I wrote that Crypto is an illusion of AI. From then to now, the combination of Crypto and AI has always been an unrequited love. AI practitioners rarely mention Web3/blockchain. These terms, but Crypto practitioners are passionate about AI. After seeing the wonder that the AI Agent framework can be tokenized, I don’t know if AI practitioners can truly be introduced into our world.

AI is Crypto’s agent. This is the best explanation for looking at this round of AI surge from a crypto perspective. Crypto’s enthusiasm for AI is different from other industries. We especially hope to promote the issuance of financial assets. and operations are integrated with it.

Agent evolution, the origin under technical marketing

Looking at its roots, AI Agent has at least three sources, and OpenAI’s AGI (Artificial General Intelligence) lists it as an important step, making the term transcend the technical level. It’s a buzzword, but in essence, Agent is not a new concept. Even with AI empowerment, it can hardly be said to be a revolutionary technology trend.

One is the AI Agent in the eyes of OpenAI, which is similar to L3 in the autonomous driving classification. AI Agent can be regarded as having certain high-level assisted driving capabilities, but it cannot completely replace people.

Picture description: AGI stage of OpenAI planning Picture source: https://www.bloomberg.com/< p>Second, as the name suggests, AI Agent is an Agent under the blessing of AI. Agent mechanisms and models are not uncommon in the computer field. Under OpenAI’s planning, Agent will become a follow-up dialogue form (ChatGPT) and reasoning form (various types of After Bot) The L3 stage is characterized by "carrying out certain behaviors autonomously", or in the definition of LangChain founder Harrison Chase: "AI Agent is a system that uses LLM to make program control flow decisions."

This This is the mystery. Before the emergence of LLM, Agent mainly performedTo give just one example, programmers will set up a User-Agent when designing a crawler program to imitate the browser version, operating system and other details used by real users. Of course, if If AI Agent is used to imitate human behavior in more detail, the AI Agent crawler framework will appear, which will make the crawler "more human-like".

In such changes, the addition of AI Agent must be combined with existing scenarios. Completely original fields almost do not exist. Even code completion and generation capabilities such as Curosr and Github copilot are only available in LSP. (Language Server Protocol, Language Server Protocol) and other thinking, there are many such examples:

Apple: AppleScript (script editor)--Alfred--Siri--Shortcuts--Apple Intelligence

Terminal: Terminal (macOS)/Power shell (Windows)--iTerm 2--Warp (AI Native)

Human-computer interaction: Web 1.0 CLI TCP/IP Netscape Browser--Web 2.0 GUI/RestAPI/Search Engine/Google/Super App --Web 3.0 AI Agent + dapp?

Explain a little, during the human-computer interaction process, Web 1.0 GUI and browsing The combination of browsers truly allows the public to use computers without any barriers, represented by the combination of Windows + IE, and API is the data abstraction and transmission standard behind the Internet. The browser in the Web 2.0 era is already Chrome era, and the shift to mobile terminals has changed people’s habits of using the Internet. Apps from super platforms such as WeChat and Meta cover all aspects of people’s lives.

Thirdly, the concept of intent (Intent) in the Crypto field is the precursor to the explosion of AI Agent circles. However, it should be noted that this is only valid within Crypto, from incomplete Bitcoin scripts to Ethereum intelligence. The contract itself is a general application of the Agent concept, and the cross-chain bridge-chain abstraction and EOA-AA wallet that were spawned are all natural extensions of this kind of thinking. Therefore, after the AI Agent "invades" Crypto, it leads to DeFi The scene is no surprise.

This is the confusion about the concept of AI Agent. In the context of Crypto, what we actually want to achieve is "automatic financial management and automatic generation of new Meme"Agent, but under the definition of OpenAI, such a dangerous scenario even requires L4/L5 to be truly realized. Then the public is playing with functions such as automatic code generation or AI one-click summary, ghostwriting, etc. The communication between the two parties is not in the same dimension.

Now that we understand what we really want, let’s focus on the organizational logic of AI Agent. The technical details will be hidden later. After all, the agent concept of AI Agent is an obstacle to the large-scale popularization of technology. Just like the browser has transformed the personal PC industry into gold, our focus will be on two points: looking at AI Agent from the perspective of human-computer interaction, and the difference and connection between AI Agent and LLM, so as to lead to the third Section: Crypto and AI Agent What will be left in the end?

let AI_Agent = LLM+API;

Before chat-based human-computer interaction models such as ChatGPT, the interaction between humans and computers was mainly GUI (graphical interface) and CLI (command-line interface, Command-Line) interface), GUI thinking continues to derive various specific forms such as browsers and apps, and the combination of CLI and Shell rarely changes.

But this is only the human-computer interaction on the surface of the "front-end". With the development of the Internet, the increase in the amount and type of data has led to The "back-end" interactions between apps and data are also increasing. The two rely on each other. Even a simple web browsing behavior actually requires the collaboration and cooperation of the two.

If we talk about the interaction between people and browsers and Apps, then the links and jumps between APIs support the actual operation of the Internet. In fact, this is also a part of the Agent, and ordinary users do not need to understand it. Terms such as command line and API can achieve your purpose.

The same is true for LLM. Now users can go one step further and do not even need to search. The whole process can be described as the following steps:

The user opens the chat window;

Users use natural language, that is, text or voice, to describe their needs;

LLM parses them into streamlined operation steps;

LLM returns the results to the user.

It can be found that in this process, the biggest challenge is Google, because users do not need to open the search engine, but various GPT-like dialogue windows, and the traffic entrance is quietly changing, which is why In this way, some people think that this round of LLM revolution is the fate of search engines.

So what role does AI Agent play in this?

In a word, AI Agent is a specialization of LLM.

The current LLM is not AGI, that is, it is not the ideal L5 organizer of OpenAI. Its capabilities are greatly limited. For example, if the user inputs too much information, it is easy to cause hallucinations. One of the important reasons is the training mechanism. For example, if you repeatedly tell GPT 1+1=3, then there is a certain probability that you will ask 1+1+1= in the next interaction? gives the probability that the answer is 4.

Because the feedback of GPT at this time comes entirely from the individual users. If the model is not connected to the Internet, it is entirely possible that your information will change the operating mechanism. From now on, it will be a mentally retarded GPT that only knows 1+1=3. , but if the model is allowed to be networked, the feedback mechanism of GPT will be more diverse. After all, the vast majority of people on the network believe that 1+1=2.

Continuing to increase the difficulty, if we must use LLM locally, how can we avoid such problems?

A simple and crude way is to use two LLMs at the same time, and stipulate that each time you answer a question, the two LLMs must verify each other, so as to reduce the probability of errors. If this does not work, there are other ways, such as Let two users handle a process, one is responsible for asking questions, and the other is responsible for fine-tuning questions, and try to make the language more standardized and rational.

Of course, sometimes the Internet cannot completely avoid problems. For example, if LLM retrieves answers from a mentally retarded bar, it may be worse. However, avoiding these information will reduce the amount of available data, so it can be completely removed. There is data splitting and reorganization, and even some new data is produced based on old data to make the answer more reliable. In fact, this is the natural language understanding of RAG (Retrieval-Augmented Generation).

Humans and machines need to understand each other. If we let multiple LLMs understand and collaborate with each other, we are essentially touching the operating mode of AI Agent, that is, the human agent calls other resources, which can even include large Models and other Agents.

From this, we have grasped the connection between LLM and AI Agent: LLM is a collection of knowledge that humans can communicate with through the dialogue window, but in practice, we found that some specific task flows can It is summarized into specific small programs, Bots, and instruction sets, and we define these as Agents.

AI Agent is still a part of LLM, and the two cannot be regarded as the same. The calling method of AI Agent is based on LLM, with special emphasis on the collaboration of external programs, LLM and other Agents, so there is Feelings about AI Agent = LLM+API.

So, in LLM’sIn the workflow, you can add instructions for the AI Agent. Let's take calling X's API data as an example:

The human user opens the chat window;

The user uses natural language, that is, text or voice Describe your needs;

LLM parses it into an API call-like AI Agent task, and transfers the conversation permission to the Agent;

AI Agent asks user X account number and API password, and based on the user description with X Networked communication;

AI Agent returns the final results to the user.

Do you still remember the evolutionary history of human-computer interaction? The browsers and APIs that existed in Web 1.0 and Web 2.0 will still exist, but users can completely ignore their existence and only need to interact with the AI Agent. , and processes such as API calls can be used in a conversational manner, and these API services can be of any type, including local data, network information, and external App data, as long as the other party opens the interface and the user has its permission to use it.

A complete AI Agent usage process is as shown in the figure above, in which the LLM can be regarded as a separate part from the AI Agent, or as They are two sub-links of one process, but no matter how they are divided, they are all serving the needs of users.

From the perspective of the human-computer interaction process, even if the user is having a conversation with himself, you only need to express what you are thinking, and the AI/LLM/AI Agent will guess you again and again. Need, the addition of feedback mechanism, and the requirement of LLM to remember the current situation context (Context) can ensure that the AI Agent will not suddenly forget what it is doing.

In short, AI Agent is a more personalized product, which is the essential difference from traditional scripts and automation tools. It is like a personal butler to consider the real needs of users, but it must be pointed out that this This kind of personality is still the result of probabilistic speculation. The L3 level AI Agent does not have human understanding and expression capabilities, so connecting it with external APIs is full of dangers.

After the monetization of the AI framework

The fact that the AI framework can be monetized is an important reason why I remain interested in Crypto. In the traditional AI technology stack, the framework is not very important, at least not as important as data and computing power. , and it is difficult to monetize AI products starting from the framework. After all, most AI algorithms and model frameworks are open source products, and what is truly closed source is sensitive information such as data.

Essentially, an AI framework or model is a container and combination of a series of algorithms.An iron pot is equivalent to an iron pot for stewing goose, but the type of goose and the control of the heat are the key to distinguishing the taste. The product sold should be goose, but now Web3 customers have come, and they want to buy the goose for the pearl. Buy the pot and abandon the goose.

The reason is not complicated. Web3's AI products are basically based on others' wisdom. They all improve on existing AI frameworks, algorithms and products to create their own customized products, and even the knowledge behind different Crypto AI frameworks. The technical principles are not much different. Since it is technically indistinguishable, it is necessary to make a fuss about the name, application scenarios, etc. Therefore, some minor adjustments to the AI framework itself have become the support of different tokens, thus resulting in the confusion of Crypto AI Agent. Frame foam.

Since there is no need to invest heavily in training data and algorithms, the name distinction method is particularly important. No matter how cheap DeepSeek V3 is, it still requires a lot of doctor's hair, GPU, and electricity consumption.

In a sense, this is also the consistent style of Web3 recently, that is, the token issuance platform is more valuable than the token, and this is the case for Pump.Fun/Hyperliquid. Originally, the Agent should be an application and an asset, but The Agent distribution framework has become the most popular product.

In fact, this is also a value anchoring idea. Since there is no distinction between various types of Agents, the Agent framework is more stable and can produce the value siphon effect of asset issuance. This is the current difference between Crypto and AI Agent Combined version 1.0.

The 2.0 version is emerging, typically the combination of DeFi and AI Agent. The concept of DeFAI is of course a market behavior stimulated by the heat, but if we take the following situations into consideration, we will find that there are differences :

Morpho is challenging old lending products such as Aave;

Hyperliquid is replacing dYdX’s on-chain derivatives and even challenging Binance’s CEX Token listing effect;

Stablecoins are becoming a payment tool in off-chain scenarios.

It is in the context of the evolution of DeFi that AI is improving the basic logic of DeFi. If the biggest logic of DeFi before was to verify the feasibility of smart contracts, then AI Agent is changing the manufacturing logic of DeFi. , you don’t need to understand DeFi to create DeFi products. This is the underlying empowerment that goes further than chain abstraction.

The era is coming when everyone is a programmer. Complex calculations can be outsourced to the LLM and API behind the AI Agent, and individuals only need to focus on their own ideas. Natural language can be efficiently converted into programming. logic.

Conclusion

This article does not mention anyCrypto AI Agent token and framework, because Cookie.Fun has done well enough, AI Agent information aggregation and token discovery platform, then the AI Agent framework, and finally the Agent token that suddenly appears and disappears, continue in the article Listing information is no longer valuable.

However, during this period of observation, there is still a lack of real discussion on what Crypto AI Agent points to in the market. We cannot always discuss pointers, memory changes are the essence.

It is also the ability to continuously convert various underlying assets into assets that is the charm of Crypto.