Kaifu Lee and Zhihua Zhou discussed large AI models, Xu Li of SenseTime advocated "slap-in-the-face moment", and MEET'25 celebrities debated over 10,000 words, witnessed by 3.2 million viewers

Kaifu Lee and Zhihua Zhou discussed large AI models, Xu Li of SenseTime advocated

"Scaling Law" and "Slap in the Face" are definitely the keywords of the year in the field of technological intelligence in 2024.

The bad news is that the traditional definition of Scaling Law is slowing down, but the good news is that new Scaling Laws have emerged. Extending the time dimension, Scaling Law has actually been playing a role in the field of AI development. At any point in time, if humans are suddenly slapped in the face, it will be a super moment. After constant slaps in the face, you will finally know which one is the so-called Killer APP.

This is a topic that more than 20 top figures from industry, academia and even the investment community repeatedly mentioned and discussed at the Qubit MEET 2025 Intelligent Future Conference.

In the seamless venue, the in-depth discussions among the experts are of course not limited to this -

Standing at the end of 2024 when the Nobel Prize is favoring AI, They look back at the development of technology, products, and business without any Unreservedly convey the plans for the future and the opportunities that have been discerned; some people enthusiastically stepped forward to answer the confusions that have been hotly discussed recently, some confessed that they had been temporarily worried due to the slowdown of technology, and some were practitioners, enthusiasts, and onlookers. Point in a direction worth trying.

It is deep, forward-looking, full of thinking and passionate.

3.2 million+ online viewers, 1,000+ live viewers and present guests witnessed a day full of interesting things.

△Even "standing tickets" are also in demand

Revolving around the theme of "Wisdom changes thousands of things, wisdom reaches hundreds of industries", the guests of this conference talked about " What I see and think about when technology evolves", "in the infinite future", "when the turning point comes" and "when the application is right".

Come and follow Qubit real-life editors and large models such as ChatGPT and Claude to highlight the key points.

When technology evolves

Kai-fu Lee: Scaling Law slows down, AI-First application Explosion Acceleration

The MEET2024 Intelligent Future Conference kicked off with an in-depth dialogue between Kai-fu Lee, CEO of Zero One Everything and Chairman of Sinovation Ventures, and Gen Li, Editor-in-Chief of Qubits.

During the conversation, Kai-fu Lee revealed the bottlenecks and challenges of OpenAI: the training of GPT-5 was not smooth sailing. The declining efficiency of large-scale GPU clusters and data and computing power bottlenecks have made the Scaling Law no longer unbeatable. OpenAI is also facing a battle between computing power investment and commercial returns.

The bad news is that the traditional definition of Scaling Law is slowing down, but the good news is that a new Scaling Law (o1 reasoning paradigm) has emerged. But let’s not forget that although the current model has not yet reached AGI, it is good enough to solve many problems.

In Kai-fu Lee’s view, traditional ScalThe slowdown of ing Law does not mean that the development of large models has encountered a ceiling. On the contrary, Chinese AI 2.0 innovators can find opportunities to overtake in corners.

First of all, AI 2.0 has become the “future battle” for countries around the world and will reshape the economic landscape and innovation landscape. China must not give up on large model pre-training. From the perspective of national technological competitiveness, mastering large model pre-training is equivalent to mastering the upper limit of model capabilities and the bottom line of safety and controllability.

Secondly, the current large models are "good enough and cheap enough". Chinese developers should seize the golden window period of application explosion, combine China's huge market demand and implementation scenarios, and learn from the world's leading experience in the mobile Internet era. With its engineering capabilities and product micro-innovation and iteration capabilities, we can create the "ChatGPT moment" of "Made in China".

He reminded AI 2.0 entrepreneurs to do some calculations first: Does their large-scale base model capability have unique value? Do you have the advantage of pre-training technology to make a model that is fast and cheap and ranks among the best in the world? If the self-developed model cannot surpass the open source model, you might as well focus on application innovation.

In terms of business strategy, the pre-training model Yi-Lightning created by Zero One Thing not only achieved the best results in the history of China's large models in the internationally authoritative "Large Model Arena" LMSYS blind test, but also achieved inference results The cost is only one-thirtieth of GPT-4o.

Zero Yiwu also actively explores the implementation of AI applications: domestically, it focuses on To B, and overseas focuses on To C. We will train the world's first-tier models in a fast, cost-effective way, and at the same time empower application developers with "fast and good" large models to create a healthy and benign large model innovation ecosystem.

Lee Kai-fu believes that in the future, leading big-model players should focus on value creation on the AI-First application side. Just like the innovative development paths in the past PC and mobile Internet eras, it is often the application layer that creates the greatest economic value. .

Zhiyuan Wang Zhongyuan: In fact, Scaling Law has always played a role in the development of AI

Dr. Wang Zhongyuan, president of Beijing Zhiyuan Artificial Intelligence Research Institute, pointed out that artificial intelligence is currently at a new turning point.

The emergence of large models marks the transition from weak artificial intelligence to general artificial intelligence. Although the current large model capabilities still have shortcomings, we can already see its profound impact on all walks of life.

He talked about one of the hottest topics at the moment: Has Scaling Law hit a wall/failed?

Looking at the past seven to eighty years, there are some essential laws behind every new technological wave. That is, as model parameters, training data and computing power improve, the model effect will also be greatly improved. In other words, if you extend the time dimension, Scaling Law is actually an important player in the field of artificial intelligence development.It always works.

Wang Zhongyuan introduced that in the past six years, Beijing Zhiyuan Artificial Intelligence Research Institute has established a top scientific research team, which was the first to engage in large model research and development in China, and has established a technical research team since October 2020. To continue to promote the research and development of large model technology.

As for the future development direction of large models, in his view, in addition to text data, there is also a large amount of multi-modal data such as images, audio, and video in the world. How to stimulate the intelligence in these data is an important direction for future large model research.

"A unified multi-modal large model will eventually emerge to realize artificial intelligence's perception, understanding and reasoning of the world." Wang Zhongyuan said.

Ant Group Wang Xu: The open source community provides neutral and extensive information for technical directions

Within Ant Group, the application of large models has penetrated into the field of financial data analysis, greatly improving processing efficiency and depth.

Wang Xu, vice chairman of Ant Group’s Open Source Technology Committee, gave a speech from an open source perspective - after all, ever since ChatGPT set off huge waves, the debate over open and closed sources for large models has never stopped.

Wang Xu emphasized that Ant Group’s open source technology growth team attaches great importance to data insights into the open source community and provides reference for Ant’s technical architecture and technological evolution.

Although community data is not comprehensive, it can reflect external perspectives and provide neutral and extensive information for technical direction.

AI-based community data display applications and AI application frameworks are emerging in large numbers. In the application direction, direct quantity increase and acceleration alone can trigger significant changes, such as Ant's financial-related services and the open source multi-agent framework agentUniverse behind them.

He provided a line chart for reference. The data showed that after the LLaMA model was open sourced, related projects experienced explosive growth. Moreover, most AI projects are developed using Python and even allow users to do without coding themselves. “These AI application frameworks allow users to develop their own AI applications with a very low threshold, which reflects that AI technology is gradually getting closer to application scenarios.”

Another observation is that in addition to changes in hardware resources, software infrastructure is also undergoing subtle changes. Wang Xu said that although the infrastructure of distributed systems has not changed much, application infrastructure and scenarios have created new demands. He mentioned that the AI 2.0 era is forming a new generation of LAMP architecture, and applications will revolve around models, which has triggered profound changes in every aspect of the infrastructure.

Finally, Wang Xu encouraged technology practitioners to adjust the software architecture according to the needs of the times and evolve their own infrastructure.

Huawei Wang Hui: Between the network and AI, there is Network for AI and AI for Network

At the meeting, Wang Hui, President of NCE Data Communication Domain of Huawei's data communication product line, focused on the "AI Large Model Enabling Network Steps" On the topic "Toward Advanced Self-Intelligence", he started his sharing from the perspective of the industrial field and ToB industry.

He pointed out that currently all walks of life are facing the problem of "how to make their products and industries more intelligent", and the implementation process faces many challenges.

In his speech, Wang Hui summarized the relationship between network and AI into two types:

Network For AI refers to how to use the network to accelerate AI training and reasoning. Al For Network refers to using AI means to allow The network has become more stable and reliable, helping the development of thousands of industries

In Network for In terms of AI, Wang Hui pointed out that the network is the key base to support the evolution of AI training scale; Huawei uses real-time and dynamic AI cluster network load balancing and AI identification and early warning faults to avoid AI training interruptions, and at the same time, AI training is not affected by cross-data centers and cross-data centers. Regional restrictions; it brings essential improvements to the scale, distributed training and inference of large models.

In the field of AI for Network, Wang Hui used the "autonomous driving" form of the network as an analogy to explain the real challenges of AI in industrial vertical scenarios: real-time, rigor and scene generalization capabilities. In critical infrastructure such as the network industry, millisecond response and zero fault tolerance have become rigid requirements for accurate decision-making. To this end, Huawei has proposed a three-layer architecture of "one brain, one map, and one network" to fully empower the network with AI and provide intelligent operational support for industrial applications.

He also emphasized:

In the industrial field, data quality, precise control and mature tools are all indispensable. Large models are a key part of it. While large models are gradually applied on a large scale, It will also connect and inject the core elements of various business management in the industrial field to drive various industries towards "autonomous driving". You Yang of Luchen Technology: Video large models need to achieve refined text control, shooting at any angle and character consistency

Founder and Chairman of Luchen Technology , Young Professor You Yang, President of the National University of Singapore, shared in-depth insights into the future development of video large models. As an expert in the field of distributed training technology, he has led a team that has previously provided large model training optimization solutions for technology giants such as Google and Huawei.

You Yang believes that the development of large-scale video models will experience leaps and bounds in the next three years:

As Sam Altman said, today is the moment of Video GPT-1, Maybe three years from now will be the GPT-3.5 and GPT-4 moment for large video models.

The most critical thing is to achieve the three core competencies.

The first is the refined text control capability. Video mockups should be able to accurately understand and represent usersThe details of the description, from character characteristics to scene elements, must be accurately controlled.

The second is the ability to shoot at any camera position and any angle. This breakthrough may completely change fields such as live sports events, allowing viewers to choose their own viewing angles, "equivalent to being able to move instantly in the stadium, to the coaching bench, to the last row, to the first row."

The third is to maintain role consistency. You Yang pointed out that this is crucial for commercial realization, "For example, for a product advertisement, the appearance of the video must not change much from beginning to end, whether it is clothes, shoes, or cars."

As for the commercial prospects of video large models, You Yang believes that they will bring revolutionary changes to film production. Through AI technology, the cost of producing special effects scenes can be greatly reduced, the actual need for dangerous lens shooting can be reduced, and creation can be more free.

In the future, AI will only need the actor's ID and the actor's portrait rights. In fact, AI can handle many dangerous shots, which can greatly reduce costs and increase efficiency for the film industry. Infinite future time

SenseTime Xu Li: The super moment can be transformed into another word, called "slap in the face moment"

SenseTime Chairman and CEO Xu Li Dr. Li decided to start a business ten years ago because he witnessed AlexNet and believed that AI had crossed the industrial red line. Regarding AGI's new journey, Xu Li put forward his cognition and thoughts in the communication with Qubit Editor-in-Chief Li Gen.

Xu Li said that over the past ten years, there are two elements that are the basis for promoting the development and progress of the industry. One is infrastructure and the other is scenario.

In his view, the next AGI era must also be scenario-based to promote the iteration of the entire technology. "Technology itself is just a technology."

Scenario application must be the driving force. Without scenario application, we don’t know what the models on the market will look like; models must also be the core driving force for infrastructure construction. The infrastructure cost caused by changes in any model today The change in value is huge.

Xu Li then introduced the two "life and death lines" in AI today, namely the life and death line of computing power cost depreciation and the life and death line of open source, and discussed SenseTime's "trinity" strategy of developing large devices, large models and applications.

What’s interesting is that when asked “What happened that can confirm the arrival of the “super moment”?”, Xu Li’s answer was so popular that several subsequent guests also mentioned it repeatedly.

I think super moments can be transformed into another word called "slap in the face moments". At any point in time, if humans are suddenly slapped in the face, that is a super moment. What is the "iPhone moment"? Everyone thought that a mobile phone must have a keyboard, and then the iPhone came without a keyboard. Why is ChatGPT a super moment? It’s because people who were working on AI originally felt that natural language was still far away, but suddenly it came out and the public recognized it, solving the problem of the Turing test. In fact, this is a typical slap in the face moment. Xiaobing Li Di: ""Private domain operation" has become the new blue ocean in the era of large models

In the past year, Xiaobing has been very silent.

But underneath the silence is Still waters run deep: Xiaobing’s domestic AI in 2024 toC product, the number of paying users is more than 20 times that of Character.AI, and the payment conversion rate is about 8 times that of ChatGPT.

Based on such results, when the big model craze stabilizes, many people. When he began to fall into FOMO about the next opportunity, Li Di, CEO of Xiaobing Company, stood up and talked about the existing opportunities.

He emphasized that: The current AI industry is in a turbulent period of technological innovation. The entry threshold for large models has been lowered, and it is difficult to form an effective monopoly on basic capabilities. Therefore, blindly waiting for technological singularity will not create actual value for the industry. The real opportunity lies when the technology enters a relatively stable period. Finally, how to monetize technical capabilities with reasonable business strategies

A core entry point is the ratio of GPU computing power cost to income (GPU). cost vs Revenue), Li Di regarded this as AI The key indicator for the success of the toC business model is that only when the cost of AI-produced content is significantly lower than what users pay, can sustainable value distribution be provided for the C-end and the upstream and downstream of the industry chain.

In addition, Li Di also shared.

Currently, the conversational form and companionship provided by Chatbot are no longer useful to users. Scarcity, and the high energy consumption of dialogue is significant, Chatbot is destined to no longer become a mass product (unless it can provide very high added value).

On the contrary, "private domain operation" has become a new blue ocean in the era of large models. AI can provide high-concurrency and personalized value content to thousands of private domain users, thereby realizing a closed business loop in high-retention and high-value scenarios.

VAST Song Yachen: AI native 3D creators will explore new content paradigms

What possibilities can be seen in 3D generation from the 3D models generated by 7 million global users? VAST founder and CEO Song Yachen has something to say

He shared: "3D generation will become a new form of interaction, just like there is an idiom called 'Speak as you say'. ”

VAST is a company that develops self-developed 3D large models. Its 3D large model Tripo can generate complete 3D models through multi-modal input such as text and pictures, supporting games, animations, metaverses, etc.

Song Yachen said. , judging from the maturity of the technology, the current effect has improved from the "360p level" at the beginning of the year to the "720P level", and is expected to reach the "1080P or even 4K level" next year.

Currently, 3D generation technology has been developed in many countries. Implemented in various fields, including traditional CG industries, such asGames, animation, film and television, etc.; industrial fields, such as 3D printing, industrial design, home furnishing, etc.; emerging fields, such as the Metaverse, XR, digital twins, etc.

Except for some commercial scenarios, we see that everyone, including everyone here, including everyone watching the live broadcast online, can share their desired 3D industrial design and product needs.

Song Yachen predicts that next year there will be millions of developers gathering in the field of 3D generation; by 2025, the number of developers may reach tens of millions; in 2026, these AI native 3D creators will explore new content paradigms.

As for the technical route, Song Yachen proposed a three-step strategy: the first step is static content generation, the second step is dynamic content generation, and the third step is to achieve zero-threshold 3D creation for everyone.

Zhou Zhihua of Nanjing University: With millions of models in the academic base system, it is possible to do many things that we did not expect

< p>Zhou Zhihua, Vice President of Nanjing University and Chairman of the Board of Directors of the International Federation of Artificial Intelligence, brought a wonderful sharing on "Learning Software and Heterogeneous Large Models", systematically elaborating on a new AI technology paradigm.

In Zhou Zhihua’s view, the key to future AI development is not the pursuit of a single huge model, but how to make millions of models work together.

He mentioned the concept of "learning software", which can be simply understood as: learning software = model + specification.

If the big model is that a few big heroes conquer the world, then the learning software believes that the power lies in the people. When the learning base system has millions of models, the power of this route will emerge, and it will be possible to do many things we did not expect.

Zhou Zhihua put forward a refreshing point of view: Effective reuse and collaboration of models can be achieved without obtaining the developer's original training data. This approach not only protects data privacy, but also maximizes model value.

He used a vivid metaphor:

Today when we want to use a meat-cutting knife, we don’t go mining and forging iron ourselves, but go to the supermarket to buy it. Similarly, when users use AI in the future, they do not have to collect data from scratch to train models. Instead, they submit their needs, and the "learning software market" will find and combine suitable models based on user needs and feedback to users.

In terms of technical implementation, Zhou Zhihua's team constructed a protocol design scheme, including semantic specification and statistical specification, and proved that this scheme can effectively protect developer data from being leaked.

Currently, they have open sourced the "Beimingwu Learning Ware Base System" and invited more developers to participate. Zhou Zhihua said that Hugging Face currently on the market can be regarded as version 1.0 of learning software, and a complete learning software system will bring more possibilities.

As a new technology paradigm, the learning base system can be regarded as a heterogeneous large model, which can not only realize the collaboration of large and small models, but also avoid catastrophic forgetting and achieve lifelong learning. The turning point is coming

TiDong Technology Chen Depin: Thousands of industries need AI, and what they need more is growth

Chen Depin, CTO of Titanium Technology, shared the innovative practice of AI in the field of overseas marketing

As a person who has worked in Alibaba for more than ten years. , a technical expert who has experienced the transformation of AI from the 1.0 to 2.0 era, Chen Depin is full of confidence in the prospects of combining AI with marketing.

In his view, marketing requires batch and industrialized innovation. content production, and the outbreak of AIGC can greatly increase content production capacity. This is the best combination of the two parties.

Specifically, Chen Depin believes that the current overseas expansion relies on two major potentials: The potential of the mobile Internet and supply chain has enabled the entire track to maintain an annual growth of 30%-40%.

In terms of specific practices, Chen Depin shared Tec, the core AIGC product of Titanium Technology. Creative 2.0 can help merchants complete the production of social media marketing materials in a few minutes, improving efficiency.

He particularly emphasized a finding:

Similar Scaling also exists in the field of marketing applications. Law's law. When marketing requires industrialized production of materials, continuous improvement of production efficiency can approach the probability of discovery of popular products. We believe that marketing can approach infinity through efficiency, thereby greatly improving the effect and eventually producing popular products.

Looking to the future, Chen Depin said that Titanium Motion Technology ENN Technology is optimizing the development path of marketing agents, and may also create an Arena for marketing materials to quickly test the suitability of various general models in marketing scenarios.

ENN Pan-energy Network Process. Lu: AI disruption in vertical industries will definitely happen

As an industry veteran who has been deeply involved in the energy industry for 17 years, Cheng, Vice President of ENN Energy and President of ENN Digital Energy Technology Co., Ltd. (i.e. President of ENN Pan-Energy Network) Lu shared the practice and thinking of embracing AI in the traditional energy industry.

As a pioneer in the traditional energy industry, ENN Pan-Energy has been exploring intelligence for many years, but before that it was mainly based on local algorithms and mechanism models. Now, the emergence of large models has changed two important links- p>

First, significantly reduce the cost of knowledge learning and reasoning, improve industrial model construction and By optimizing efficiency, model performance can be increased by up to 50%; the second is to allow ordinary practitioners to quickly "align" to high-level decision-making levels, thereby massively improving the overall awareness and execution quality of the industry.

Then, How can the traditional energy industry embrace AI changes? Cheng Lu said that it can be summarized as "choose trainees" "The four moves are to select open large models, use models to combine mechanisms, industrial cognition and industrial algorithms, train professional models, and finally generate usable large models for implementation in specific applications, which are integrated into three major intelligences:

Decision-making Intelligence: Assist management to quickly make decisions on optimal solutions Operational intelligence: Realize energy leadershipAutonomous state transaction intelligence at the domain operation level: optimizing real-time transactions of source network load storage

He emphasized that the basis of all this lies in the powerful simulation model - mapping the physical world to the digital world, so that enterprises do not need to pay in the physical world Parameter tuning or problem solving can be achieved with a large amount of trial and error costs. Simulation emphasizes a large number of operating boundary conditions and industry mechanisms, and needs to simulate real-time operating conditions. Cheng Lu specifically pointed out: “This kind of simulation is more like the current ‘automatic driving system’ for cars,” which will ultimately significantly improve energy quality and reduce loss costs.

“AI disruption in vertical industries will definitely happen.” Cheng Lu believes that with the continuous lowering of the technical threshold for large models and the full release of industrial data resources, disruptive innovations will also emerge in traditional fields such as energy. of innovation.

Xiaomi Meng Erli: The automotive industry is moving from "software-defined cars" to a new turning point of "AI-defined cars"

Meng Erli, senior technical director of Xiaomi Technology Committee's AI Laboratory, shared Xiaomi's exploration and practice of using large industrial models to empower automobile intelligent manufacturing.

He demonstrated the innovative breakthroughs AI technology has brought to traditional manufacturing from a unique perspective.

Meng Erli first introduced Xiaomi’s technological strategic upgrade, which he summarized as the formula (software × hardware)ᴬᴵ, indicating that Xiaomi regards AI technology, including large models, as a new productivity and also The underlying track that Xiaomi has invested in for a long time.

Xiaomi has been deploying in the AI field since 2016. In 2023, it will form a large model team to apply cutting-edge technology to mobile phones, cars and other products. In the field of automobile manufacturing, Xiaomi chose to break through the "large die-casting" process and first focused on two aspects: material research and development and quality testing.

Traditional new material research and development uses a "trial and error method", and the cycle may be as long as 10 years, which is unacceptable to the business.

In order to solve this problem, Meng Erli's team innovatively proposed the "grey box model" solution:

Combining the use of data-driven AI black-box methods and materials science mechanism-driven white-box models The simulation software generates a large amount of low-quality data, and the pre-trained model uses a small amount of high-quality experimental data to fine-tune the model

Ultimately, a set of diverse material AI simulation systems are formed. Based on this, the team successfully developed Xiaomi Titan alloy material from tens of millions of candidate spaces.

In addition, in terms of quality inspection, the team also developed a large industrial quality inspection model. It has solved problems in the quality inspection industry and has been reported by CCTV many times as a benchmark for AI+ manufacturing.

Looking to the future, Meng Erli believes that the automotive industry is moving towards a new turning point from "software-defined cars" to "AI-defined cars". He made three suggestions: strengthen digital infrastructure, promote industry standardization, and explore large-model technologies suitable for industrial scenarios.

Shengwang Liu Bin: Real-time requirements and engineering implementation are the key to the implementation of Agent

At the conference, Liu Bin, Chief Operating Officer of Shengwang, shared a link that seems a little far away from the big model, but is actually indispensable, that is, the real-time interaction of RTE in the era of AI Agent. Brand new value”.

In 2020, Shengwang was listed on Nasdaq and is currently the world's largest real-time interactive cloud service provider. The platform's audio and video usage reaches 70 billion minutes in a single month.

Liu Bin emphasized two points regarding the key elements for the implementation of AI Agent.

The first is the real-time requirement. Different from traditional text interaction, multimodal agents require duplex real-time dialogue. According to Agora's test data, to achieve a natural conversation experience, the delay needs to be controlled within 1.7 seconds.

The real implementation of productization is not to make a demo in the laboratory, but to ensure stable operation on various terminals and various network environments. At present, through continuous optimization of audio collection, transmission, playback and other aspects, Shengwang can achieve a voice dialogue delay between humans and AI as low as 500ms.

The second is engineering capabilities. Shengwang has built a global SD-RTN network™ that supports more than 30 platforms and more than 30,000 terminal models, and can achieve end-to-end transmission within 400 milliseconds. These accumulations make it possible to rapidly scale up AI Agents.

In the past, the interaction between humans and AI was mostly in the form of text, and latency and experience issues were not prominent. But now, large models are rapidly evolving into multi-modal agents. Users can communicate with AI through voice, video, and expect to get a natural feeling like face-to-face conversations. This requires extremely low transmission delay and highly robust network quality support.

“Only when the interaction delay is low and has features such as intelligent interruption and super-anthropomorphism, users will experience a smooth conversational experience like communicating with real people.” Looking to the future, Liu Bin proposed , it is necessary to develop special optimization solutions based on the characteristics of human-machine dialogue.

The application is timely

Zhipu Zhang Fan: AI is beginning to become a basic production factor, or it may bring underlying changes to business

At the conference, Zhipu COO Zhang Fan focused on sharing the new opportunities in the rapid iteration and commercialization of large models in the past two years.

Zhang Fan first pointed out that large models are different from other existing technologies. Large models are naturally an application-oriented technology. “Generative AI enters this market much faster than the Internet and PCs.” quick".

Zhang Fan said that in the past two years, the capabilities of all aspects of the model have been improved, and correspondingly, the cost has dropped, which has led to the rapid implementation and application of technical capabilities.

In this process, Zhipu’s understanding of AGI target capabilities is divided into five levels:

The first level is language; the second level is solving complex problems, like o1 Such abilityThe emergence of power; the third level is the use of tools, such as autonomous agents that can operate mobile phones, PCs and even car interfaces like humans to obtain information; the fourth level is self-learning; the fifth level is to surpass humans, AI will have the ability to explore scientific laws, The ability to answer ultimate questions such as the origin of the world, so the path to AGI will be a clear and unambiguous link.

Zhang Fan emphasized that large models are no longer just technologies, but have begun to become new basic production factors, which may bring many bottom-level and upper-level changes to business, including working methods, organizational forms, business models, and even Barriers for every business.

Finally, Zhang Fan discussed how companies or individuals should build their own technology strategies in the era of big models. He believed that there are four key elements:

Choose the appropriate base, build and strategic goals Organizations that match business attributes redefine data assets based on scenarios and AI capabilities, and seamlessly integrate these capabilities into the business, thus forming a flywheel.

There are many things here that require everyone to think deeply, such as the base model. Many people ask us whether open source or closed source is better, whether it is better abroad or domestically. I think the right one is actually the best. . Volcano Engine Zhang Xin: The key to enterprise implementation of large-scale model applications is rapid trial and error and agile action

In the past, programming started with "Hello World", but now we are embarking on the road to AI , should start with "Hi Agent".

Zhang Xin, Vice President of Volcano Engine, shared the current situation and thoughts on the implementation of large model applications in 2024. In his view, 2024 will be a year for various industries to extensively explore the application of large models, and their implementation will show three major characteristics: speed, breadth and depth.

In terms of application scenarios, large models have also completed three stages of jumps: from the initial entertainment chat to the current serious production scenarios, and even begun to enter the field of scientific research to explore and discover new knowledge.

As Dickens said in "A Tale of Two Cities": "It was the best of times and the worst of times." Zhang Xin believes that large models bring unlimited innovation opportunities, but if companies cannot keep up Agile speed iteration may also lead to loss of competitiveness.

Zhang Xin mentioned that he had a new feeling recently:

When an enterprise wants to implement a good AI application, its challenge is not that there are no scenarios to do, but that there are too many choices. How does a slap-in-the-face moment form in our opinion? After constant slaps in the face, you can finally know which one is the so-called key APP.

HiAgent is an enterprise-specific AI application innovation platform launched by Volcano Engine. It is highly adapted to the individual needs of enterprises, allowing business personnel to easily build intelligent agents, so that business innovation is not restricted by production skills. Provide low-code, scenario-based templates and end-to-end consulting services to better understand AI transformation; provide industry plug-ins that can be seamlessly connected with enterprise business systems to more flexibly adapt to enterprise needs; support RAG knowledge base and full-stack privatization of large models Deployment to provide stronger security and provide enterprise data knowledgeKnowledge protects and protects.

In terms of specific implementation practices, Zhang Xin also shared the implementation practices of Volcano Engine HiAgent in education, consumption, enterprise services and other industries, and shared practical implementation methods. The first step is for enterprises to Drawing an enterprise-specific scene map is often a divergent step, and ultimately leads to hundreds of different application scenarios. The next step is to divide these scenarios into a magic quadrant based on feasibility and value. Start with high-value, technically feasible scenarios first.

The key to enterprise implementation of large-scale model applications is rapid trial and error and agile action. The Volcano Engine HiAgent platform helps enterprises to efficiently build enterprise-level intelligence by solidifying best practices, precipitating assets in exploration scenarios, and assisting enterprises. Deepen AI capabilities.

Zhang Yi from Bar-headed Goose: AI applications must be able to be deployed quickly and iterated efficiently

Zhang Yi is the original founding team member and vice president of DingTalk. During his tenure at DingTalk, he spent 8 years leading the team to successively create popular products such as DingTalk attendance approval and smart person log.

Since 2022, Zhang Yi has led the team into the game as the CEO & founder of BetterYeah AI (Bar-headed Goose) and began to devote himself to exploring and helping companies enter the AI era.

Today, hundreds of leading companies have completed the implementation of enterprise-level production-level agents on Bar-headed Goose, involving scenarios including customer service, data, marketing, and business systems. Zhang Yi emphasized that the customer service scenario is the fastest to be implemented, the incremental value of data tasks is obvious, and the trend of Agent integrating into the core business system of enterprises is becoming more and more obvious, which is directly providing productivity to enterprises.

“For Agent, enterprise production-level scenarios are very different.” Zhang Yi added, “Agent’s implementation in the core business flow brings productivity, which has a great impact on the Agent’s integration capabilities, concurrent calls, Data security requirements and collaborative building capabilities The requirements will be higher. ”

But traveling with cutting-edge technology means greater challenges. Different from POC verification and lightweight AI application development, production-level Agents must be used in application construction, performance evaluation, Rapid iteration places higher demands on enterprise development teams.

BetterYeah continues to focus on enterprise production scenarios, using standardized products to provide an AI Agent development platform that meets flexible integration capabilities, greater concurrent calls, higher data security, and more complex collaboration. After this year, it is expected that enterprise-level AI platforms will face the challenges of more complex application scenarios and stronger self-planning capabilities.

When talking about the secret to the success of enterprise AI Agents, Zhang Yi emphasized that 70% of the workload of production-level Agent development is testing and debugging, and building a "feedback evaluation-self-learning-verification" closed loop based on data and AI. Giving full play to the value of AI can effectively improve the efficiency and success rate of Agent development, and these methodsThe method has been commercialized and integrated into the BetterYeah platform.

Kunlun Wanwei Fang Han: Use innovation in product form to hit the fundamental point of users

Kunlun Wanwei Chairman and CEO Fang Han At the conference, the company’s layout and thinking from technology to products in the wave of AI large models were shared.

Kunlun Wanwei began to deploy AI in 2020, and has now built full-stack AI capabilities from the computing layer, model layer to application layer. Fang Han introduced that Kunlun Wanwei has large language models, large multi-modal models, large 3D models, large video models, and large music models. Currently, the music model has the best technical indicators.

During the exploration process, Fang Han gave some of his business thoughts. He believes that everyone is constantly thinking about the big AI model, and it is a very important question what kind of business model companies choose to carry out product development and promotion.

Fang Han said that Chinese AI companies are greatly restricted in terms of computing power, and the hardware computing power they can obtain is relatively limited. This will force companies to have great incentives to invest in algorithm iteration, which is the so-called soft to complement the hard. At the same time, the pressure to survive and the inability to get money are also big problems, "making Chinese AI companies desperately trying to polish their product business models."

He also said that AIGC is giving birth to a new era of "cultural equality" and that the advancement of AIGC technology will greatly reduce the threshold and cost for everyone to create content.

For users, they don’t care at all whether your content is made by AI or humans. They only care about two points: your content is either new or good.

Finally, Fang Han proposed that AI entrepreneurs should pay more attention to product form innovation and use innovation in product form to hit the fundamental point of users, rather than looking at how much AI is used.

Xinyan Group Ren Yongliang: Embodiment and active interaction are the new AI directions for pan-psychological services

Founder of Xinyan Group, Chairman and CEO Ren Yongliang shared the practical experience of how the pan-psychological industry embraces AI changes from the perspective of a vertical field user.

Ren Yongliang first introduced the AI-driven pan-psychological community of Xinyan Group - QiQi APP. Ren Yongliang said that as early as 2019, Chechi launched the first pan-psychological question and answer model based on BERT, which received user response that exceeded expectations.

Talking about the AI transformation process, Ren Yongliang admitted that he had experienced a change in mentality from "shocked" to "worried" to "determined". He believes that an industry should neither be too close to AI nor too far away. The key is to find the right balance. "If it is too far away, you will not be able to use such services. If it is too close, it will be easily overwhelmed."

Based on the practice of the past two years, Ren Yongliang summarized three insights.

The first is expectation management. It is easy for AI to achieve a score of 60, but it is often difficult to achieve a score of 90, and the team's expectations need to be managed well.

itsThe second is tissue engineering. AI transformation cannot rely on piecemeal efforts, but requires the entire organization to revolve around AI, including all-round transformation of products, operations, technology, etc.

Finally, believe in young people. The successful experience of the mobile Internet era may not be applicable to the AI era. Unfettered young people are more likely to bring innovation.

Looking to the future, Ren Yongliang proposed two key development directions:

Embodiment is the inevitable trend of pan-psychological services. In addition to text and speech, consultants also need facial expressions and a sense of ritual, which requires AI services to also implement multi-modal input and output. Active interaction will become the next breakthrough. Current AI services are all responsive, and in the future they need to be able to proactively ask questions and start conversations based on scenarios.

Embodied Intelligence Roundtable: Way to AI Robots

The old rules of the MEET Intelligent Future Conference are always to provide exciting and informative output roundtable forum, and this year is no exception.

However, the topic discussed at this conference has been upgraded to the broader and hot field of embodied intelligence.

The guests invited to the Embodied Intelligence Roundtable are:

Tang Rui, chief scientist and vice president of Qunhe Technology and head of KooLab.

Gao Yang, co-founder of Qianxun Spirit AI and doctoral supervisor at the School of Interdisciplinary Information at Tsinghua University.

Li Chao, co-founder and CTO of Yunshen Technology.

Under the auspices of Qubit Editor-in-Chief Li Gen, the guests discussed swords in Huashan. The topics centered on "cognition of embodied intelligence", "what technological breakthroughs are there", "what stage of development is it currently at", etc. Expand.

How to understand or define embodied intelligence?

Tang Rui believes that the biggest difference between embodied intelligence and AI is that it comes out of chips, monitors, memory, and video memory. It not only has a brain and interacts with us through the screen, but may also be able to interact with us through the screen. Interact with the external physical world we live in. Although there is a word "body" in embodied intelligence, Tang Rui feels that it may not necessarily require a human form, as long as it has such a skill, "self-driving cars can also be regarded as a relatively mature and concrete realization of embodied intelligence." ".

Gao Yang answered this question very intuitively through a specific example: Once I was giving a lecture on embodied intelligence, an old lady about sixty or seventy years old listened to me talk a lot and asked me Talking about when robots can take care of her in old age, this is actually an application scenario of embodied intelligence. The goal of embodied intelligence is to build robots that can help us complete various tasks. This robot can help us do various things, such as helping our grandparents take care of themselves.

Li Chao believes that Yunshen is the first batch of beneficiaries of embodied intelligence. Embodied intelligence gives robots a soul. With the blessing of this soul, the robot's adaptability is enhanced, the progress of large-scale applications is accelerated, and it can face a more open environment.

Why this year isThe first year of embodied intelligence?

Li Chao believes that with the shift from traditional rule-based control methods to the emergence and maturity of new technologies such as training and reinforcement learning, the intelligence and applicability of robots have been greatly improved, thus breaking through past limitations and boundaries.

Gao Yang also said that one of the most critical factors in starting an embodied intelligence business now is that OpenAI has proven that pre-training combined with a series of post-training methods can indeed produce products that at least look like humans. Intelligence, or the ability to achieve the same appearance as human intelligence.

Tang Rui has a background in graphics. He pointed out that with the addition of AI deep learning, the iteration system of computing power has begun to change from the iterative direction of instruction level to the iterative direction of parallel computing, resulting in The cost of parallel computing will be reduced to very low. Parallel computing is nothing more than simulating two things. One is to simulate the human brain and predict the future or different modes through deep learning prior knowledge; the other is to simulate the physical world, and in embodied intelligence, everyone will use MuJoCo to do physics. , interactive simulation. Qunhe Technology does exactly the latter.

2024, representative progress or events in the industry?

Tang Rui has noticed that more and more top scholars and teams originally engaged in graphics and three-dimensional vision research (such as Li Feifei, Leo Guibas, Su Hao, etc.) are beginning to devote themselves to the field of embodied intelligence. They rely on their innate advantages in virtual worlds and environmental simulations to inject new power and perspectives into the development of embodied intelligence.

The progress that Gao Yang is most concerned about is how to use the massive data on the Internet and middle-layer representation methods to introduce the large model pre-training paradigm into embodied intelligence. This not only includes the mature application of VLA (Visual-Language-Action) models, but also involves reducing the reliance on manual collection of operational data by introducing middle-layer structures such as trajectory representation and particle simulation, so as to provide better solutions for embodied bodies in the next three to four years. Laying the foundation for intelligent sustainable development.

In practice, is data the key challenge now?

Li Chao believes that data is not the main challenge at the robot body and control levels they are currently concerned about, but as more complex scenarios and operational requirements emerge in the future, data issues may gradually become a challenge next year.

Tang Rui believes that the current big sticking point for embodied intelligence is the lack of high-dimensional, physically correct data, and what the group-core space intelligence platform needs to do is to provide an AI-interactive world for embodied intelligence. In addition, he emphasized that the accuracy of real physical simulation required by embodied intelligence is much higher than that required for pure visual content creation.

He gave an example. Although video generation tools like Sora can currently reproduce visual effects realistically, they are still insufficient to provide accurate physical parameters and interactive feedback, making it difficult to directly meet the training needs of embodied intelligence. This means that before realizing AGI-level robots, how to obtain high-precision and interactive simulation data is still a key issue that needs to be solved.

Is there a standard division of embodied intelligence similar to L0-L5?

Li Chao said not only that, but also very clearly, last yearIn the past, many of them were L1, or L0 to be precise, because many of them were controlled by humans. But now it needs to be divided into industries. It can reach L4 in a fixed small-scale scenario, and the robot can make independent decisions and judgments.

In Gao Yang’s view, the original intention of setting a standard is to promote the development of an industry and to measure the level of each embodied intelligence technology. However, no matter what the standard is, it may Finally, due to objective technical limitations, this standard has become a thing that is more biased towards propaganda. Within a limited time, no one can achieve the L4 or L5 level in a wide range of scenarios.

Up to now, what stage has embodied intelligence reached?

Tang Rui compared the various parts of the robot to the four core organs of human beings: "hands, eyes, feet, and brain." Viewed separately, each part surpasses or is close to humans, but a highly coordinated integrated system has not yet been formed. So overall it's still in the early stages. Gao Yang believes that the original intention of setting a standard is to promote the development of an industry and measure the level of embodied intelligence technology. However, no matter what the standard is, it may eventually change due to limitations of objective technology. It has become a thing that is more focused on propaganda rhetoric. Within a limited time, everyone cannot reach the L4 or L5 level in a wide range of scenarios.

Li Chao is more optimistic. He does not use analogies, but believes that embodied intelligence has brought profound changes in special scenes such as industry. Although the demand for home use is not yet clear, the practical application in the professional field Applications have shown strong influence, promoting accelerated changes in the industry structure and showing a more optimistic development prospect.

Online Consultation