Source: Zhidongxi Author | Zhidongxi Editorial Department
Zhidongxi reported on December 5 that today, with the theme of "Intelligent Leap Forward" The 2024 China Generative AI Conference (Shanghai Station) with the theme of "Create Infinity" officially opened, and the conference site was packed with seats.
The conference lasted for two days, with a total of 50+ guests deconstructing and analyzing the technological product innovation, business implementation solutions, future trends and cutting-edge research focus of generative AI from a forward-looking perspective.
On the first day of the conference, led by Zhang Qi, professor at the School of Computer Science and Technology of Fudan University and deputy director of the Shanghai Intelligent Information Processing Laboratory, 17 guests focused on general language models, multi-modal large models, and industry giants. The latest research and development and practical experience were shared on cutting-edge topics such as models, vertical large models, agents, embodied intelligence, large model alignment and security, and investment trends.
“As an important branch in the field of artificial intelligence, large model technology is constantly promoting the innovation and transformation of the industry.” Xu Qi, deputy secretary-general of the Shanghai Artificial Intelligence Industry Association, said in his speech, “In this context Next, this generative AI conference is held to further promote the development of the generative AI industry in Shanghai and promote the disseminate the academic achievements of schools and research institutions, and at the same time strengthen communication with outstanding enterprises and institutions in the Yangtze River Delta region to jointly explore the future development of artificial intelligence. ”
▲Vice President of Shanghai Artificial Intelligence Industry Association. Secretary-General Xu Qi
Xu Qi pointed out that Shanghai has always played a leading role in the field of AI. In terms of industrial scale, the number of AI companies above designated size in Shanghai has increased from 183 in 2018 to 348 in 2023, and the industry scale has increased from 134 billion yuan to 380.8 billion yuan, ranking among the top in the country. In terms of innovation achievements, 46 large models have been registered in Shanghai, and a number of general-purpose humanoid robot prototypes have been released. In terms of industrial ecology, Shanghai is accelerating the creation of innovative carriers to attract enterprises to gather; it continues to optimize the layout of computing power infrastructure and increase the overall supply of computing power resources; and improves the basic corpus data support system.
Facing the future, Shanghai will firmly seize the development opportunities of general artificial intelligence, work closely with enterprises and institutions at home and abroad, continue to promote the innovative development of artificial intelligence, and accelerate the creation of new momentum and new advantages for high-quality development.
As an industry summit IP created by Zhiyi Technology, the 2024 China Generative AI Conference was jointly hosted by Zhixi Dongxi and Zhiyuan. More than 3,000 people signed up to attend the conference, and the venue was packed. The China Generative AI Conference has been successfully held in Beijing twice. This is the first time that the China Generative AI Conference has been held in Shanghai.
Gong Lunchang, co-founder and CEO of Zhiyi Technology, delivered a speech on behalf of the organizer: "Seven years ago, our first AI Industry Conference was successfully held in Shanghai, officially starting our journey of holding industry summits in the AI field. Journey. After 7 years, our Generative AI Conference returns to Shanghai again.”Compared with the Generative AI Conference Beijing Station, the Shanghai Station Conference has upgraded its content from the two dimensions of industry and technology, focusing on the four directions of model, AI Infra, application, and technology.
▲Gong Lunchang, co-founder and CEO of Zhiyi Technology
Gong Lunchang also previewed many important conferences this year and next year - early next month, the 4th Global Autonomous Driving Summit will be held in It will be held in Beijing; an industry summit will also be held during the 2025 Shanghai Auto Show; in 2025, brand summits in fields such as AI chips and generative AI will continue to be held. Everyone is welcome to attend.
1. High-end dialogue: The big model is a once-in-a-century new productivity revolution, hotly discussing the new trend of the capital marketThe high-end dialogue session was themed "The Era of Big Models, the New Trend of the Capital Market" and was hosted by the co-founder of Zhiyi Technology and the Chief Editor of Zhichexin Industry Media Matrix Hosted by Zhang Guoren, Ren Xiaodong, partner of Jingya Capital, Wen Yongteng, executive director of BV Baidu Ventures, and Zhu Xiang, partner of Dachen Caizhichen Yunzi Fund, shared topics such as generative AI investment strategies, large model market patterns, and large model commercialization paths. point of view.
Zhang Guoren said that from the perspective of social development, this wave of technological development led by generative AI is a once-in-a-century new productivity revolution, whether it is personal interaction and companionship, or life, work, and study. A new round of changes is taking place. But no matter how the wind direction changes, we still maintain a good yearning for the development of new AI technologies.
▲ Zhang Guoren, co-founder of Zhiyi Technology and chief editor of Zhichexin Industry Media Matrix
Ren Xiaodong, partner of Jingya Capital, believes that large-scale model products are divided into two categories: public cloud and privatization , due to the open source technology of public cloud, it is easy for big manufacturers to implement it, resulting in very few opportunities for startups; in terms of privatized deployment, startups can customize deployment according to enterprise scenarios, and big manufacturers have no obvious advantages in this regard. In addition, in the field of AI infrastructure software, if startups choose open source, it will be difficult for startups to make a difference and compete with large manufacturers.
He also emphasized that in AI investment, compliance is the first principle, and it must comply with national policies and regulations as well as the agreements signed with LPs. What fields cannot be invested and what should be disclosed must be observed.
▲Ren Xiaodong, partner of Jingya Capital
Wen Yongteng, executive director of BV Baidu Ventures, said that BV has been paying close attention to generative AI startups since 2021. From the beginning to now, it has always believed that Generative AI will reshape the way content is produced and distributed. The emergence of Diffusion allows them to find Shengshu Technology and outstanding companies in other modal directions in the multi-modal field. Now, the development of AI Agent has made it start to think about investment in intelligent workforce.
Sequoia Capital (Sequoia Capital USA) analyzed that the global commercialization of generative AI reached US$3 billion last year. It is difficult for the industry to see a market direction that has just started to attract attention for a year.Achieve huge revenue growth in just one year. Because large model companies generally follow an operating model with heavy R&D investment, financing and commercialization are key issues, especially the progress of B-side commercialization still requires time and patience. But he believes the day of greater growth will come, it’s just a matter of time.
▲Wen Yongteng, executive director of BV Baidu Ventures
Zhu Xiang, partner of Dachen Caizhichen Yunzi Fund, said that domestic large-scale model startups are still catching up with the pace of OpenAI. The arms race stage, and some companies are developing slowly due to financing problems.
The field of generative AI has begun to emerge in different training directions from OpenAI, such as world models and embodied intelligence. Recently, Li Feifei and Google DeepMind released a new world model. The new Scaling Law promotes more efficient model generation through synthetic data, which can effectively avoid the slowdown problem of Scaling Law.
Zhu Xiang said that embodied intelligence is the key carrier of AGI, but it faces challenges such as high training costs and hardware limitations. He predicted that embodied intelligence may experience a "bottleneck period" of 2-3 years before mass production, during which some companies will withdraw from the market.
▲Zhu Xiang, partner of Dachen Caizhichen Yunzi Fund
2. What is the boundary of the capabilities of large models? Audio and video capabilities have advanced, and innovative architectures have emergedAt today's conference, Zhang Qi, professor at the School of Computer Science and Technology at Fudan University and deputy director of the Shanghai Intelligent Information Processing Laboratory, gave an in-depth explanation of the capabilities boundaries of large language models and development thinking, MiniMax Deputy President Liu Hua discussed the changes in the development focus of large models this year. Zhang Chi, assistant professor of West Lake University, shared a large model for monocular depth estimation that is universal for all scenes. Director of the Large Model Alignment Execution Center of Peking University (Lingang), Beijing Alaimen Technology Co., Ltd. CEO Xu Hua discussed model security.
Zhang Qi believes that large models are developing rapidly but are still in the "memory stage". There are different stages of large model training, from knowledge compression and representation learning, capability injection to generative task capability improvement. Training only requires very little data. Only 60 pieces of training data are needed to complete knowledge question and answer in a certain field, but "how to add data" is the hardest question.
▲Zhang Qi, professor at the School of Computer Science and Technology at Fudan University and deputy director of the Shanghai Intelligent Information Processing Laboratory
After he asked the large model to do this year’s college entrance examination mathematics questions, he found that the large model Calculation procedures and answer choices in mathematical operations are inconsistent. The results show that although the model can complete specific task reasoning, it does not truly acquire human-like abilities.
Zhang Qi concluded that there are two paths for the development of large models: one is to follow OpenAI and aim to replace all mental work; the other is not to pursue the replacement of general tasks, but only to complete specific things. The most critical thing is the selection of landing scenarios and the judgment of the boundaries of large model capabilities.
Regarding the changes in the field of large models this year, Liu Hua, Vice President of MiniMax, believes that compared with the rapid improvement of basic large model capabilities in the text field in 2022-2023, the improvement of basic large model capabilities in 2024 will be more comprehensive, reflected in text, speech , music, video and other fields.
He said that currently multi-modal large models have empowered thousands of industries in our country and transformed them into new productive forces. For example, MiniMax has served more than 30,000 customers in China; the company's large video models are loved by AI entrepreneurs in 180 countries and have been maturely used in cultural creativity, e-commerce live broadcast and other fields.
▲MiniMax Vice President Liu Hua
He judged that multi-modal large models are still in the rapid development stage, and the upper limit of model capabilities has not yet been seen. Facing the future, MiniMax will continue to rapidly iterate self-developed large multi-modal models, focusing on three aspects: reducing model error rates, achieving infinitely long input and output, and promoting more natural integration of multi-modal models.
Zhang Chi, assistant professor at West Lake University, shared a large monocular depth estimation model that is common to all scenes, and his thoughts on solving the pain points of traditional monocular depth estimation methods. Traditional monocular depth estimation methods rely on professional equipment such as lidar, which makes data collection difficult and costly, resulting in less and scattered data. Based on the AI large model, it can more efficiently utilize big data training, visual large model priors and training optimization paradigms to pursue full-scenario generalization.
▲ Zhang Chi, Assistant Professor of West Lake University
At the same time, he mentioned that the Zero-shot monocular depth estimation method is flexible and easy to carry, and can be applied to robots, autonomous driving, and AI Wensheng 3D , AI graphics 3D and other fields.
Xingchen, CEO of Xihu Xinchen and head of the achievement transformation of the Deep Learning Laboratory of West Lake University, introduced Xihu Xinchen’s exploration and achievements in the fields of AI emotional understanding and multi-modal long-distance dialogue. Since its inception, their team has been committed to developing super-anthropomorphic emotional intelligence models to adapt to various human-computer interaction scenarios involving complex emotions. Its self-developed multi-modal universal base large model "West Lake Large Model" uses deep alignment technology and multi-modal emotion recognition technology to enhance AI's emotion recognition and demand understanding capabilities, making long-distance dialogue between humans and aircraft a reality.
▲Xingchen, CEO of Xihu Xinchen and head of the achievement transformation of the Deep Learning Laboratory of Xihu University
This year, the company launched Xinchen Lingo, the country’s first end-to-end universal voice model , complementing the voice interaction capabilities, making AI more human-like, understanding people's hearts and speaking human words. These "super-anthropomorphic" technologies have been applied to Xinchen's AI psychological counseling companion product "Liaohui Xiaotian".
The evolution of large model innovation technology is also accelerating. RockAI CTO Yang Hua shared the client-side practice of Yan, a large non-Transformer architecture model. Although the Transformer architecture has achieved great success in the field of large models, people have also begun to think about whether it is overly dependent on it and existing large models.Sustainability of form.
▲RockAI CTO Yang Hua
The Yan architecture includes a brain-like activation mechanism and MCSD. The former refers to the human brain neural network, and the latter can make full use of GPU computing power during training and reduce power consumption. Consumption. Multi-modal large models based on this architecture can be deployed on end-side devices such as mobile phones, computers, robots, drones, and Raspberry Pi, and the models have strong command following capabilities and multiple application scenarios. Autonomous learning and swarm intelligence are also RockAI’s thinking and exploration in the field of large models.
The security of AI applications is crucial. Xu Hua, director of the Large Model Alignment Execution Center of Peking University (Lingang) and CEO of Beijing Alaimen Technology Co., Ltd., analyzed the contradiction between the safety and practicality of large models and shared the exploration of multi-modal alignment.
▲Xu Hua, director of the Large Model Alignment Execution Center of Peking University (Lingang) and CEO of Beijing Alaimen Technology Co., Ltd.
Xu Hua said that excessive pursuit of safety may sacrifice practicality . To this end, he proposed a value alignment plan and set the "3H principles" (Helpful, Honest, Harmless) as goals to ensure that the model conforms to human values. He emphasized that the Aligner aligner solution balances safety and practicality in multi-modal scenarios. The next step will focus on improving the model's adaptability in medical, education and other fields, breaking through the upper limit of human experts, and promoting the development of AGI.
3. At the inflection point of AI’s implementation, agents, 3D generation, and embodied intelligence become the focus
The implementation of large models is a hot topic in 2024, with innovative gameplay methods such as embodied intelligence, 3D generation, AI agents, and music generation emerging one after another.
1. AI Agent has implemented specific algorithms, and its application value in business scenarios is highlighted
AI Agent’s multi-modal perception, memory enhancement and reasoning capabilities are gradually improving. CEO of Lianhui Technology Zhao Tiancheng, chief scientist, said that the industry is shifting from "LLM-First" to an "Agent-First" architecture that is more in line with human cognition. Through the new algorithm, the AI Agent can dynamically enlarge the picture and perform information analysis when the visual information is unclear, thereby improving multi-modal perception capabilities and enabling the 7b model's inference accuracy to surpass the gpt-4o large model and reach a level close to the human benchmark.
▲Zhao Tiancheng, CEO and Chief Scientist of Lianhui Technology
In the three core scenarios of reasoning, memory and perception, AI Agent has implemented specific algorithms. Lianhui Technology has launched a comprehensive open source Agent framework to support the continuous optimization of AI Agents by building a standardized basic framework.
WeMeet Huishen has built a multi-agent business interconnection platform based on a large model. Gu Xuebin, founder of WeMeet Huishen, mentioned the role of AI in businessScenario applications have many important values.
▲Gu Xuebin, founder of WeMeet Huishen
For example, AI assistants for business people; providing support for business activities in different language environments to help people communicate across language barriers; solving Questions about business opportunity generation bring closer connections between potential buyers and industry sellers; conference applications can also be quickly generated. Finally, he also emphasized security issues and the need to register generative AI services to ensure the stable and reliable development of AI applications in business scenarios.
2. End-to-end embodied multi-modal large model, targeting robot generalization
Co-creation partner of Galaxy Universal Robots, person in charge of large model, Beijing Zhiyuan Artificial Intelligence Research Dr. Zhang Zhizheng, PI of the Institute of Embodied Intelligence, said that from models to products to new productivity, it is not enough to focus on "task automation" for embodied intelligence. What Galaxy General is pursuing is "process automation." The key to achieving this is to use large-scale simulation synthetic data to drive robots from the bottom up to achieve breakthroughs in environmental perception and action skill learning capabilities. From a large model system that combines 3D small models with large action models to end-to-end embodied multi-modal large models, Galaxy General has a comprehensive layout, focusing on improving the generalization capabilities of robots in real scenes.
▲Dr. Zhang Zhizheng, co-creation partner of Galaxy Universal Robots, person in charge of large models, embodied intelligence PI of Beijing Zhiyuan Artificial Intelligence Research Institute
When it comes to the future, he believes that embodiment The development direction of intelligence is the collaborative evolution of "robot brain, cerebellum and hardware ontology", focusing on its generalization breakthrough in "process automation", thereby promoting robots to complete more complex tasks more efficiently and intelligently in reasoning and execution tasks. Mobile operation tasks.
3. 3D and music generation have reached an explosive point, showing the potential of commercial application in multiple scenarios
In terms of 3D generation, VAST CTO Liang Ding analyzed the 3D AIGC with the support of large models development and application. In his view, the development process of 3D and other large multi-modal models is similar. They will go through a process from technology accumulation to an explosion at a certain point in time. 3D has now reached an explosion point.
▲VAST CTO Liang Ding
He believes that 3D AIGC can be commercially applied in multiple scenarios, such as reducing costs and increasing efficiency and bringing new innovations in traditional games, film and television animations How to play; 3D printing customized production can be realized in industry; it can also be applied in metaverse fields such as social live broadcast e-commerce, and can also be used to customize toys and integrate with education.
2024 is the first year that AIGC music will explode. Jia Shuo, vice president of Quwan Technology, believes that the innovative development of artificial intelligence has greatly lowered the threshold for music creation, and the naturalness of the singing of domestic AI music has exceeded human ear recognition. Threshold, the effect is comparable to that of the American head model. He shared the changes in the form of AI interaction between people and music, from Wensheng music, to three-key music, to humming music. In June this year, Tianpu Music launched the world's first multi-modal music generation model, supporting video composition andThe picture-to-music function generates a complete piece of music based on user videos or pictures with one click.
▲Jia Shuo, Vice President of Quwan Technology
In addition, Jia Shuo previewed Tianpu Music’s new function-MidiRender for the first time at the scene. The model is like an accurate and controllable music version. Control Net can fill in lyrics and complete arrangements based on original music clips.
4. The legal and medical vertical track models are implemented, and Ant accelerates the commercialization of AI
Cai Hua, the head of Huayuan’s large computing model and knowledge reasoning algorithm, dismantled the Huayuan Law School The underlying technical architecture of the model and its five main implementation scenarios. The general general large model is not enough to cover the needs of the legal field. In order to make the large model more suitable for the legal professional field, the company collected multi-source heterogeneous knowledge data, including 6 major types of basic knowledge, and used laws and cases as the basis. The central node builds a relationship graph.
▲Cai Hua, head of Huayuan Computing Large Model and Knowledge Inference Algorithm
At present, its application scenarios are mainly divided into two major sectors: rule of law business and rule of law decision-making. Specific similar cases are recommended , legal article recommendation, judgment document generation, law popularization and anti-fraud publicity digital all-in-one machine and Little Snowman legal intelligent assistant, etc.
Wu Xian, head of Tianyan Research Center of Tencent Youtu Laboratory and an expert researcher, mentioned the current top ten application scenarios based on medical large models, including department guidance, doctor recommendation, pre-consultation, doctor-patient In the field of dialogue, disease inquiry, case generation, discharge summary generation, medical knowledge answering, clinical practitioner examination, internal efficiency improvement of pharmaceutical companies, and generation of medical popular science articles.
▲Wu Xian, head of Tianyan Research Center of Tencent Youtu Lab and expert researcher
He also introduced the problem of mitigating large model illusion, language imbalance and large model evaluation. , as well as the latest research progress on multi-language, multi-modal medical tasks.
Zhao Yao, Director of Ant Group’s Basic Intelligence Technology Department, shared the application of large language models in business and how to solve the problems of reasoning efficiency, reliability and availability through technical means. Ant Group balances reasoning efficiency and accuracy through knowledge distillation and knowledge migration. Distillation transfers knowledge from large models to small models, reducing the amount of calculation and maintaining accuracy; knowledge migration helps models quickly adapt to different scenarios and improve application effects.
▲Zhao Yao, Director of the Basic Intelligence Technology Department of Ant Group
In addition, Ant Group also uses compression and pruning technology to reduce costs and energy consumption, improve computing efficiency, and reduce hardware investment. The company's goal is to promote the commercialization and popularization of AI.
Conclusion: Landing in Shanghai for the first time to explore the pulse of China’s generative AI industryIn addition to the above-mentioned guests, there were also 6 young scholars and technical experts who shared useful information and roundtable panel discussions at the end-side generative AI technology seminar in the afternoon.
Following the continuousBeijing has held two high-profile innovation summits focusing on generative AI. Today, Smart East and West and Orangutan have landed in Shanghai for the first time to jointly hold an industry event focusing on the field of generative AI. We hope that through rich agenda setting and diverse guest experience sharing and opinions The collision makes everyone’s trip worthwhile.
The excitement will continue tomorrow. 25 representatives from industry, academia and research will explore the pulse of China’s generative AI industry around topics such as AI Infra, AI video generation, and embodied intelligence.