What will happen to AI in 2024? (Down)

Image source: generated by Unbounded AI

July event: WAIC stars shine, DeepSeek rises to fame

If June is about undercurrents, then July is about showing strength and exchanging ideas in the spotlight. The World Artificial Intelligence Conference (WAIC) held in Shanghai every year is the highest-level AI event in China. The 2024 conference is unprecedented in scale and attracts global attention.

On the eve of the conference, on June 13, the Yuewen App of Step Star was officially launched, integrating photo question and answer, intelligent search and other functions, aiming to improve work and study efficiency and simplify life. This application, based on the large model of the Step series, optimizes Internet search and document parsing capabilities, supports photo recognition and voice input, and document analysis in multiple formats, providing users with a convenient AI assistant.

On July 4, at the 2024 WAIC Conference, Step Star Company released three new Step series large model products, including a trillion parameter language model, a multi-modal large model and an image generation large model. , achieved a leap from hundreds of billions to trillions of parameters, and made breakthroughs in the multi-modal field.

The next day, Premier Li Qiang of the State Council personally visited the 2024 World Artificial Intelligence Conference and toured the exhibition hall, especially visiting the Step Star booth. Step Star showed the Prime Minister the latest progress of its Step series of general-purpose large models, including trillion-parameter language large models and multi-modal understanding generation technology.

On July 5, Zhipu AI also released the fourth-generation CodeGeeX model at the World Artificial Intelligence Conference and announced that it was open source. The CodeGeeX4-ALL-9B model integrates multiple functions such as code completion, question and answer, and interpreter, making it the most powerful all-round code model with less than 10 billion parameters.

In addition, on July 30, Kimi intelligent assistant launched a PPT production tool to improve office efficiency. Since then, the PPT generation function has gradually become a standard feature of domestic AI tools. Kimi’s move also reflects the penetration and popularity of AI technology in the office field.

The successful hosting of WAIC not only showcases the latest progress in China's AI technology, but also promotes exchanges and cooperation in the field of AI at home and abroad. Shortly after WAIC, news from an authoritative international evaluation agency further affirmed the strength of China’s AI technology.

On July 16 (US time), the update results of the large model arena organized by LMSYS showed that DeepSeek-V2-0628 surpassed many top models and topped the list of global open source models. This achievement proves the competitiveness of China's open source large models on the global stage and has also won international reputation for China's AI industry.

August: Ready to go—technological innovation and application expansion

The residual heat of the WAIC conference in July has not dissipated, and the competition in the AI field has not stopped. Instead, with the arrival of late autumn, it has entered a more intense game stage. If the previous period was the stage of preliminary exploration and layout of various forces, then the late autumn season means that the battle of real swords and guns has officially begun. The flow of talent continues, technological breakthroughs are emerging one after another, and every company is trying to find its own foothold.

In August, capital still focused on the AI track. Dark Side of the Moon received over US$300 million in financing, raising its valuation to US$3.3 billion. This huge amount of financing has undoubtedly injected strong confidence and financial support into the company's future development, and also indicates that they will play a more important role in the next competition.

At the same time, technological breakthroughs also began to emerge. On August 6, Zhipu AI achieved an important breakthrough in the field of video generation and open sourced the CogVideoX video generation model. This lightweight model that only requires 18GB of video memory can generate a 6-second video, greatly lowering the threshold for developers and allowing more people to participate in AI video creation.

What is even more surprising is that less than a month later, on August 28, the CogVideoX-5B model with larger parameters and stronger performance was also announced as open source, and the video memory requirement was reduced to The lowest is just 11.4GB. Zhipu AI’s continuous breakthroughs in the field of video generation not only demonstrate its strong technical strength in this field, but also accelerate the popularization of AI video generation technology.

Not only that, ByteDance also launched a one-stop AI creation platform called "Jimeng AI" on August 6, directly benchmarking Kuaishou's Keling and Sora to further expand Its layout in the field of AI creation is an attempt to occupy a more favorable position in this emerging market.

At the technical level, DeepSeek also used innovative hard disk caching technology on August 2 to significantly reduce API service delays and costs, greatly improve user experience, and lay the foundation for subsequent larger-scale applications. Base.

All in all, August is a month of technological innovation and application expansion, accumulating strength for the next competition.

September: A hundred flowers bloom - breakthroughs in multiple fields and pattern evolution

In September, various companies have launched more active explorations at the technical and application levels, and competition has become more intense, showing a situation where a hundred flowers are blooming.

On September 6, Zhipu announced that the video call function of its AI product "Qingyan" is now fully open and provides a limited-time free trial. This new feature breaks through the traditional limitations of typing and voice interaction, enabling AI to "see" the world and understand user expressions and emotions, thereby providing a more natural and smooth interactive experience, which undoubtedly elevates user experience to a new level. this signZhuzhipu's large model has made important progress in multi-modal interaction and has successfully caught up with the level of GPT-4o released by OpenAI in May, demonstrating the speed of Chinese AI companies in catching up in technology.

Also on September 6, DeepSeek released the V2.5 model, which not only integrates general conversation and code processing capabilities, but also significantly optimizes human preference alignment, writing and instruction following. , and continues to maintain practical functions such as Function Calling, FIM completion, and Json Output, improving the overall performance of the model.

DeepSeek V2.5 lived up to expectations and won the subsequent global large model competition, ranking first in the country, even surpassing the strongest closed-source model in the country, and leading the country in 8 individual capabilities. The model has once again proven its strong technical strength to the world and won international reputation for China's open source model.

On September 10, Kimi API began to support online search functions, becoming the first Chinese AI company to launch a function similar to OpenAI Search, providing users with a more convenient and smarter conversation experience, and also providing other Enterprises set new benchmarks and promote the development of AI applications.

More importantly, on the same day, Apple officially launched “Apple Intelligence” at its autumn new product launch. This event has epoch-making significance. It marks that AI has officially entered the mobile operating system level and opened the door to A new era of AI mobile phones. Apple Intelligence is deeply integrated into the iOS system, bringing users an unprecedented intelligent experience, such as intelligent notification summaries, automatic email replies, intelligent photo editing, etc.

This move quickly caused a shock in the entire mobile phone industry, and in the following months, it triggered a collective follow-up by Chinese mobile phone manufacturers, and they launched AI OSs that competed with Apple Intelligence. Trying to get a head start on the new track. The release of Apple Intelligence is undoubtedly one of the most important events in the mobile phone industry in 2024. It not only changes the way users interact with mobile phones, but also opens up new application scenarios for the development of AI technology.

On September 12, OpenAI launched o1-preview and the faster and cheaper o1-mini, once again pointing out a new direction for the industry. Both products place more emphasis on investing more "thinking" before answering. time" to improve the ability to solve complex problems and provide new ideas for the development of large models.

More importantly, the launch of OpenAI o1 marks that the development of AI has officially entered the "reasoner" stage. The previous AI was more of an "executor", able to complete tasks according to instructions, while o1 began to show certain reasoning capabilities, able to better understand problems, analyze information, and give more reasonable answers.

Once again, Chinese companies saw new goals and began to actively explore technological breakthroughs in the direction of "inference", striving to occupy a leading position in the next wave of AI technology.

September is also a critical month for the video generation track. The conch video generation model abab-video-1 released by MiniMax has attracted much attention at home and abroad. It is not only popular among domestic netizens, but also highly praised by foreign users, demonstrating the potential of China's AI in the field of video generation.

However, it is regrettable that Zhang Qianchuan, the product leader of MiniMax and the helmsman of "Hoshino" and "Talkie", also left the company this month due to personal reasons and changed his position to a product consultant. This undoubtedly adds a touch of uncertainty to the future development of MiniMax, and also triggers the industry’s thinking on the stability of talent in AI startups.

On September 19, Alibaba Cloud’s Tongyi Wanxiang was officially unveiled at the Yunqi Conference, and showed unique advantages in various styles such as Chinese style, 3D animation and CG thick coating, attracting A lot of attention has also provided more possibilities for AI artistic creation.

At the same conference, Alibaba Cloud announced that the Qwen2.5-72B model is globally open source, and announced that its performance surpasses Llama 405B, supports 128K tokens, and generates 8K tokens content, fully demonstrating the power of AI in programming , a huge breakthrough in multi-modal capabilities, which has also further promoted the development of the open source ecosystem.

On September 20, Tencent Yuanqi AI intelligent agent was officially released, which brought new possibilities to the creation of public accounts. It also marked the further application of AI in the field of content creation and heralded the way of content production. changes.

On September 24, ByteDance also released a large beanbao video generation model, claiming that it broke through the difficulty of multi-subject interaction and supported multi-style, multi-proportion, consistent and multi-shot generation, which is suitable for e-commerce marketing. , animation education and other fields will undoubtedly further intensify competition in the video generation track and promote technological progress and application innovation in this field.

On September 25, Baidu AI’s Wenxin Quick Code won first place in two authoritative evaluation reports, Sullivan and SuperCLUE, ranking first among domestic AI code products with a total score of 87.55.

September can be said to be a month when a hundred flowers bloom, and various companies have made remarkable progress in different directions.

From August to September, China's AI industry showed a booming trend in technological innovation, application expansion and talent flow. Each company is actively exploring its own advantages and breakthroughs, and jointly promotes the progress of China's AI industry. What new stories will happen in the next few months?

Video generation: from catching up to surpassing—China’s AI breakthrough path

Among the many branches of AI technology, video generation is undoubtedly one of the focuses that has attracted the most attention in recent years. On this track full of challenges and opportunities, Chinese AI companies have experienced a journey from catching up to surpassing.

During the Spring Festival, OpenAI’s Sora was released shockingly, which brought a huge impact to the global AI community and once put Chinese AI companies under great pressure. However, this pressure has instead stimulated Chinese companies' enthusiasm for innovation and speed of catching up.

Just a few months later, Chinese companies have proven their strength with practical actions. On June 6, Kuaishou took the lead in low-key launching the self-developed large-scale video generation model "Keling". Once launched, this product showed amazing capabilities: 1080p ultra-high-definition resolution, 2-minute video generation capability, and free aspect ratio adjustment—these key indicators were significantly ahead of the industry level at the time. Even surpassing Sora, which had not yet been officially released at the time.

The development trajectory of "Keling" can be described as steady: Tusheng Video was launched in June, the web version was opened in July, and the "AI Director Co-Creation Plan" and version 1.6 were launched in December. The AI-generated content such as movies and TV series has exploded on major social platforms, firmly occupying the leading position in the field of video generation.

In September, MiniMax’s conch video generation model abab-video-1 suddenly emerged. It not only received praise domestically, but also gained high recognition among overseas users. At the same time, startups such as Vidu and Pixverse have also demonstrated outstanding technical strength. Tencent’s open source Hunyuan video model even surpasses Sora in terms of effect.

When OpenAI finally officially released Sora after waiting for nearly 10 months, it brought unexpected disappointment to the market. Due to various reasons, the actual effect of Sora is far from the original demonstration video. Not only does it lag behind Google's Veo2, but it is also left behind by many Chinese products. This marks the first time that a Chinese company has truly surpassed OpenAI on the video generation track.

Success on this track has given Chinese AI companies great confidence. In July, Zhipu AI released "Qingying", which generated millions of copies within 6 days of its launch. In order to maintain its competitive advantage, Zhipu quickly open sourced the CogVideoX model in August. In November, "Qingying" was upgraded to support 4K, 60-frame ultra-high-definition video generation, and added the CogSound sound effect model. In September, Alibaba Cloud’s Tongyi Wanxiang chose to seek breakthroughs in vertical fields such as Chinese style and 3D animation.

The success of the video generation track not only proves that Chinese AI companies have the ability to surpass international giants in niche fields, but more importantly, it breaks the curse of "always catching up" and provides a new opportunity for Chinese AI companies. The industry has injected new confidence and indicates that China's AI will move towards a new stage of independent innovation and leadership.

In the aftermath of the price war, China’s AIEnterprises are beginning to enter a deeper competition: the track of technological innovation and global competition. In November, this smokeless competition quietly heated up, and every tiny breakthrough may redefine the industry ecology.

November: Acceleration period of technological innovation

As the year comes to an end Soon, Chinese AI companies have launched their final sprint in 2024.

On November 19, Step-2 of Stepstar ranked fifth in the world in the international authoritative list LiveBench, second only to OpenAI’s o1-mini. This achievement marks the success of Chinese AI companies. Its strength on the international stage is gradually improving. During the same period, its Step-1V ranked first in China on the latest Chatbot Arena list, keeping pace with Gemini-1.5-Flash, demonstrating its impressive technical strength.

In terms of model open source and multi-modal application, Tencent took the lead in launching an attack. On November 5, Tencent’s Hunyuan language model and 3D model were officially open sourced. Its latest MoE model "Hunyuan Large" has a parameter size of 389B and is in a leading position in multi-disciplinary evaluations.

"Hunyuan3D-1.0" supports generating 3D text images, providing a powerful tool for developers and researchers. On November 14, Tencent Yuanbao 2.0 was fully upgraded, adding a dedicated section for AI applications. The hybrid model architecture supports multi-modal understanding and generation, further expanding the application boundaries.

However, the road to technological innovation is not always smooth. On November 19, Liu Wei, the technical director of Tencent Hunyuan Large Model, chose to resign. This personnel change caused the industry to pay attention to the flow of talents.

At the same time, Baidu demonstrated new technological breakthroughs at its World Congress. Robin Li announced the launch of iRAG, a retrieval-enhanced lexicon technology, and the code-free tool "Miaoda". iRAG is dedicated to solving the illusion problem in AI image generation, while "Miaida" allows non-programmers to easily realize their creativity, marking the positive development of AI applications. Toward popularization.

In terms of mathematics and reasoning capabilities, Kimi Intelligent Assistant released a new generation of mathematical reasoning model k0-math on November 17, and its mathematical problem-solving capabilities are benchmarked against the OpenAI o1 series. The Kimi Discovery Edition launched at the same time enhances search intent, source analysis and chain thinking capabilities to provide users with more intelligent problem solutions.

On November 20, the preview version of DeepSeek's new inference model DeepSeek-R1-Lite was released, and users can experience it through the official website. The model performs well in fields such as mathematics and programming. The reasoning process includes reflection and verification. The length of the thinking chain can reach tens of thousands of words, demonstrating the reasoning ability beyond models such as GPT-4o.able. Currently it only supports web use, and will be open source and provide API services in the future.

Throughout November, the common goal of Chinese AI companies seems to be very clear: to catch up with the o1 version released by OpenAI in September before the Spring Festival. The user base of Baidu Wenxin Yiyan has reached 430 million, and Alibaba Cloud's QVQ-72B-Preview is comparable to OpenAI o1 and Claude3.5 Sonnet in visual understanding and reasoning capabilities for the first time. These developments have confirmed the determination of domestic enterprises to catch up.

From technology evaluation to model open source, from multi-modal applications to reasoning capabilities, China’s AI scene in November showed unprecedented activity and competition. Companies are narrowing the gap with international giants at an unprecedented speed, showing exciting innovation potential.

Various signs this month indicate that Chinese AI companies are no longer satisfied with imitation, but have begun to actively speak out on the global stage. In December, this competition will enter a more intense stage.

December: A comprehensive breakthrough in innovation

If we talk about November It is the prelude for Chinese AI companies to accelerate their catch-up, so December is a key chapter for a comprehensive breakthrough. This month, Chinese AI companies have shown unprecedented offensiveness in technological innovation, model development and business layout.

Step stars become the focus of this month. On December 13, the company launched Step-1o, China’s first large-scale end-to-end speech model with 100 billion parameters. This model not only supports mixed input and output of speech and text, but also has high IQ and EQ, can understand emotional information, and provide professional advice. and emotional companionship.

The launch of Step-1o marks that this latecomer has fully aligned with the GPT-4o released by OpenAI in May and achieved a major breakthrough in the field of voice interaction. Immediately afterwards, the company completed hundreds of millions of dollars in Series B financing, with investors including Tencent Investment, Wuyuan Capital and Qiming Venture Partners, highlighting the capital market's confidence in its technological potential.

Kimi Intelligent Assistant released the visual thinking model k1 on December 16, which is a breakthrough model based on reinforcement learning technology. k1 supports end-to-end image understanding and thinking chain technology, covering basic science fields such as mathematics, physics, and chemistry. In multiple benchmark tests, the k1 model surpassed the global benchmark model, giving Kimi wings to take off in the field of visual thinking.

DeepSeek intensively launched a series of blockbuster models in December. On December 10, the V2.5 final version of the fine-tuning model was released, which improved its capabilities in multiple dimensions such as mathematics, coding, and writing through Post-Training. On December 13, DeepSeek-VL2 was officially unveiled, introducing dynamic image cutting strategy and MoE architecture, significantly improving visual capabilities. On December 26, DeepSeek-V3 was born, with 671B parameters, has performed well in multiple field evaluations, especially mathematics and Chinese ability, and the generation speed has been increased to 3 times.

ByteDance continues to make efforts in the AI ecosystem this month. On December 4, Doubao AI Assistant added a new image understanding function, allowing users to upload images and obtain content analysis. On December 11, the company upgraded Jimeng’s product priority and committed to creating “Douyin in the AI era.” On December 19, it was even reported that the company is in talks with Apple to integrate its AI model into the iPhone in the Chinese market. If this news comes true, it will be a major breakthrough in cross-border cooperation.

Zhipu Technology delivered an amazing answer on the last day of 2024. The preview version of GLM-Zero is not only catching up with OpenAI's o1, but also making innovative attempts in reasoning methods. This model based on extended reinforcement learning technology is on par with o1-preview in multiple reviews, marking the transformation of Chinese AI companies from "followers" to "innovators".

Throughout December, Chinese AI companies seemed to have found a delicate balance: while catching up with OpenAI, they began to establish their own technical features and innovation paths. From voice interaction to visual thinking, from multi-modal models to reasoning technology, these breakthroughs are not just technological iterations, but also the exploration of a new technological paradigm.

When the last day of 2024 comes to an end, Chinese AI companies are already standing at a new starting point. In 2025, this global AI competition without gunpowder will become even more complicated and confusing.

2024: The Dilemma and Breakout of the Chaser

Looking back at 2024, Chinese AI companies have gone through a journey full of ups and downs. The journey of catching up that started at the beginning of the year has experienced the impact of Sora, the challenge of GPT-4o, and the new goals brought by the o1 series. In this endless pursuit, Chinese companies have shown amazing execution and rapid iteration capabilities. Every OpenAI innovation can be rewarded with a rapid response from Chinese companies.

However, this "you make the move and I follow the move" model also exposes the shortcomings of original breakthroughs. In terms of basic models and product innovation, Chinese companies play more of a "catcher" role. If anyone breaks this cycle in 2024, it may be DeepSeek’s several original attempts. This low-key company not only cultivates AI wizards like Luo Fuli, but also continues to work hard on basic research, showing a different innovation path.

Looking forward to 2025, Chinese AI companies face greater challenges: how to maintainWhile maintaining the ability to quickly catch up, it also cultivates real innovation soil. ByteDance and Xiaomi's aggressive deployment in the talent market, Step Star's introduction of top scientific research talents, and Zhipu AI's innovative attempts in reasoning technology all indicate that the industry is going through the pain of transformation. From "catching up" to "surpassing", the road may still be long, but the direction has become increasingly clear.