AI distributed 20,000 red envelopes on site, ushering in the era of large model Act

Image source: Generated by Unbounded AI

Recently, the field of large models is undergoing a revolution caused by AI Agents. The upgraded version of Claude 3.5 Sonnet launched by Anthropic detonated the AI circle as soon as it was launched.

As a new generation of AI agents, it has transcended the dimensional wall of large models and can directly manipulate electronic devices like humans, moving the cursor, clicking on corresponding positions and virtual objects according to the natural voice instructions given by you. The keyboard enters information, mimicking the way humans interact with computers. Everyone is voluntarily exploring how to use intelligent agents. For example, some people are already using intelligent agents to automatically replace the daily tasks of liver collapse and iron deficiency.

In addition to playing games, agents can also take over many daily tasks in the work environment, such as writing emails, arranging meetings, organizing files, etc. It is said that everything from scientific research to writing code can be done.

Some people say that the emergence of intelligent agent tools marks a new step in a new human-computer interaction paradigm.

It didn’t take long for domestic companies to come up with benchmarking products, and went a step further, covering mobile phones, PCs, and AI native hardware at once.

This morning, Zhipu Agent announced an upgrade and opened applications for "millions of private betas", opening a new page in the human-computer interaction experience.

This is Zhipu’s first productized intelligent agent, which allows AI to directly control hardware devices through voice, and can also operate globally across different apps.

At the press conference, Zhipu CEO Zhang Peng demonstrated the capabilities of the intelligent agent. Let the Agent establish a face-to-face group chat with the live audience.

Send a red envelope totaling 20,000 yuan.

The red envelopes issued by AI were sold out instantly. I have to say thank you, Mr. Zhang, and thank you AI Agent.

Go deep into mobile phones and PCs, and you can make your own decisions

The mobile version of AutoGLM and the computer version of GLM-PC of Smart Spectrum Intelligence are covered in the internal testing phase Some commonly used apps and applications. AutoGLM supports social platforms such as WeChat, Douyin, Xiaohongshu, Weibo, food platforms such as Meituan, Ele.me, etc., shopping platforms such as Taobao, JD.com, Pinduoduo, travel platforms such as Amap and Baidu Maps, and 12306 , Qunar, Ctrip and other travel booking platforms.

After users open AutoGLM, they only need to move their mouths (of course text input is also supported), and the intelligent agent can take over their mobile phones and automatically execute any command tasks on these Apps, such as on WeChat. Summarize articles from a certain public account, plan travel routes for you on the Amap map, etc.

This time, Zhipu has carried out a series of capability upgrades for AutoGLM. baseWith these new abilities, we saw some new ways to play.

The first is "longer", that is, AutoGLM can understand, follow and autonomously complete ultra-long and complex instructions, supporting more than 50 steps of uninterrupted coherent operations. Execute long tasks faster than human performance.

The second is "cross-App", that is, AutoGLM supports cross-App operations of complex tasks with the support of stronger generalization capabilities and thinking chains. With this agent, there is an additional scheduling layer between users and applications that can be executed automatically, eliminating the trouble of switching back and forth between different apps and realizing collaborative operations between these apps.

We take information sharing between different apps as an example. We command AutoGLM to "plant several SLR cameras on Xiaohongshu, and then share them to the "Editorial Gag" group on WeChat." The operation is very simple. Silky smooth.

Another example is shopping across different apps, AutoGLM can also do it in one go.

More new ways to play further expand the functions of AutoGLM, including "short passwords", which are similar to shortcut commands on mobile phones. In this mode, AutoGLM can store user-defined shortcut passwords with one click, and automatically initiate and execute associated long tasks after triggering the command.

What’s more interesting is “blind box opening”. AutoGLM will skip the dialogue step by default, and let AI actively help you complete the selection for vague instructions issued by the user. During the process, secondary confirmation will only be performed when important operations (such as payment) are involved.

AutoGLM’s autonomous execution capabilities also extend to the web page. Zhipu provides the AutoGLM-Web function on the Zhipu Qingyan plug-in for browsers (Google Chrome and Microsoft Edge). This function is adapted to social media websites such as Zhihu, Weibo, X and Douban, search engines such as Baidu, Google and Bing, academic websites such as Baidu Academic, Google Scholar and arXiv, as well as GitHub code hosting websites and information websites.

On these websites, the agents follow user instructions and can automatically perform on-site searches, content summarization, generate arXiv daily reports, build GitHub warehouses, and check in on Weibo Super Chat and other personalized functions. They are very playable. . As shown below, we can let it automatically help us share new news on Weibo.

On the desktop, GLM-PC has also launched GLM-PC, an application for operating computer software like a human. It is based on the understanding and task planning capabilities of the general visual large model CogAgent, allowing users to use simple one-sentence instructions. Perform complex tasks.

For example, query and summarize the information on the web page and send it to others via WeChat:

Buy XL size down jacket on Taobao and purchase:

Coming soon The invisible screen function is more sci-fi. AI can provide help without disturbing you, freeing up screen usage and allowing you to complete work on another invisible screen.

In terms of implementation principles, GLM-PC plans tasks after fully understanding user instructions, then identifies windows, graphics, text and other information in the computer interface, and then automatically operates the computer. In addition, this AI assistant can change the plan and self-correct according to the page information during use, so as to better complete the task.

According to reports, GLM-PC is particularly good at handling office scenarios and can perform diverse tasks on WeChat, Feishu, DingTalk, Tencent Meeting and other platforms, such as sending information, booking and participating in meetings. At the same time, it supports browser web search, reading summary and translation of web content, and can also perform a variety of document processing, including downloading, sending and summarizing.

Open and join the Feishu meeting.

Send meeting minutes via email.

Not only that, Zhipu also realizes the linkage between GLM-PC and mobile phones. Users can now send messages to GLM-PC remotely on their mobile phones, allowing it to automatically perform computer operations.

Finally, Zhipu stated at the press conference that it would provide free Auto upgrades for one billion apps. Major manufacturers such as Honor, Asus, and Xpeng Motors, as well as hardware and chip manufacturers such as Qualcomm and Intel, also took the stage to introduce their cooperation with Zhipu.

With the emergence of large models with new capabilities such as end-to-end, multi-modal, and video, large models have initially gained the ability to interact with the physical world.

We can gradually imagine the "unprecedented natural interaction" described by Sam Altman, but many of the products we can come into contact with always seem to be almost meaningless. This may be because building disruptive products requires not only large-scale model capabilities, but also advance prediction of technical directions and optimization of the complete system.

In fact, in addition to developing basic technologies for large models, Zhipu has been promoting another thing recently: building a system.

We can gradually imagine the "unprecedented natural interaction" described by Sam Altman, but many of the products we can come into contact with always seem to be almost meaningless. This may be because building disruptive products requires more than just large model capabilities, as well as early prediction of technical directions and optimization of the complete system.

Intelligence has a long history of research in the direction of large model Agents. Since April 2023, Zhipu has successively proposed large-model agent work such as AgentTuning, AgentBench, and CogAgent. This year, Zhipu has successively released results such as AutoWebGLM and AutoGLM. Zhipu’s research and development work on AutoGLM and GLM-PC has also gone through more than a year and a half.

In the process of exploring the boundaries of the capabilities of large model agents, intelligence spectrum has gradually gainedTwo important observations are made. First, agents and reasoning essentially obey the Scaling Law similar to large model training. By interacting with the environment, the model obtains feedback supervision signals from the environment, which has a similar scale expansion effect. This shows that by expanding the computing scale, we can continuously improve the performance level of large model agents. Behind the new Scaling Law, Zhipu designed WebRL, a self-evolving reinforcement learning algorithm framework for online courses. By introducing the self-evolution strategy unique to large models, using course learning to generalize the agent from easy to difficult, and finally using online off-policy reinforcement learning, AutoGLM realizes the expansion rules of the agent in the online environment. Secondly, further exploration of the intelligence spectrum revealed the existence of Emergent Ability of Agent, that is, the emergence of abilities. When it was released in October, AutoGLM could only demonstrate its capabilities in a single application and short-distance tasks. However, with further training by engineers and expansion in scale, the latest version of AutoGLM has initially achieved a level of competence across applications and long-distance tasks, and can even follow complex instructions to operate in apps it has never seen before.

In fact, in addition to developing basic technologies for large models, Zhipu has been promoting another thing recently: building a system.

Due to the emergence of large multi-modal models, AI now has capabilities such as semantic understanding, screen content analysis, and behavioral semantic understanding. The next thing to do seems to be to find a mechanism that allows the large model to solve the problem step by step.

The AI Agent is used to perform such complex tasks. It is both autonomous and capable of interacting with the environment. It can decompose complex tasks for planning, use professional models or external tools to improve its capabilities, and it also has memory capabilities that far exceed those of large models themselves.

This means that after adding intelligent agents, devices such as mobile phones can use relatively lightweight models to carry out more complex automation tasks.

Previously in the industry, some major technology companies, startups and mobile phone manufacturers have built intelligent agent capabilities on PCs and AI mobile phones, and have achieved good results. But from a technology development perspective, this is often an extension of their respective product lines. On this basis, the solutions provided by Zhipu will also cover AI-native hardware such as cars, smart glasses, smart speakers, and even embodied intelligence robots, reflecting another way of thinking.

Zhipu believes that in the future, different hardware devices may be operated by a unified system of AI agents, so that human-computer interaction can be improved. To this end, they have also made advance layouts on the chip, application App, operating system OS and model sides.

Including continued cooperation with chip and terminal manufacturers, optimizing from the bottom of the hardware, and continuously optimizing the ability of the large end-side model. In October Qualcomm SnapdragonWhen the 8 Extreme Edition was released, Zhipu announced that it had teamed up with Qualcomm to carry out in-depth adaptation and inference optimization of the latest generation of end-side vision large model GLM-4V. After being deployed on the device side, this year's new generation of flagship mobile phones can already support rich multi-modal interaction methods, allowing people to obtain a more contextualized and personalized terminal-side intelligent experience.

Zhipu has also cooperated with many mobile phone and computer manufacturers to implement large models in the fields of AI PC and mobile smart assistants. It was the first to show off the glory of AI intelligent robots operating mobile phones, and in September it reached a strategic cooperation with Zhipu on AI large model technology.

This week, Zhipu also jointly released the CODE AI programmer notebook specially designed for programmers with Intel and Mechanical Revolution, which is pre-installed with a terminal-based intelligent programming assistant.

Through device-side chip performance optimization and device-cloud integrated architecture, Zhipu’s large-model agent technology will appear on more and more devices in the near future.

What is the end of Agent?

Although the current technology is still in its infancy, AI agents are already showing promise.

Thinking about it at a deeper level, in the past, physical interactions such as keyboards, mice, and touch screens, from DOS, Windows to iOS, Android and other operating systems, were all designed to allow people to better interact with machines. communicate.

Large models are taking the opposite path, eliminating the need to spend a lot of time understanding the complex interfaces of various applications, reducing mechanical labor, and in turn making machines adapt to humans.

Zhang Peng, CEO of Zhipu, said at the press conference: "The current Agent capability is more like adding an intelligent scheduling layer between users, applications and devices. It can be regarded as a large-model general-purpose operating system. A prototype of LLM-OS. This has had a great impact on the form of human-computer interaction. More importantly, we see the possibility of a large model operating system, LLM-OS, which has the opportunity to be realized based on large model intelligence capabilities. Native human-computer interaction."