The big model of 2024: We are desensitized to the development of AI

Image source: Generated by Unbounded AI

AI has reduced people’s sensitivity to technological progress and made everything develop faster.

In imagination, technological progress seems to change lifestyles inadvertently, but the popularity of artificial intelligence always stays on Weibo and Zhihu, and the public also has nothing to do with this kind of thing. In the hustle and bustle of the world, I am becoming increasingly desensitized.

This phenomenon is especially reflected in the popularity of various topics in the AI circle. Looking back at the whole year at the end of the year, it is not difficult to find that there are actually two things that have caused the most discussion:

Byte interns attack large model training, Dark Side of the Moon and Zhu Xiaohu’s capital game.

But this is by no means the truest picture of China’s AI circle. We can lightly say that a certain AI function is “nothing remarkable” and a certain technological breakthrough is “just that”. But at the end of the year, we Looking at the year 2024, it will still be an out-and-out technological boom.

01. Large models are more practical, but no longer stunning

At the beginning of 2024, the domestic large-scale model field showed a situation of "big players vying for the throne". According to statistics from "Every Classic", as of April 2024, 305 large models have been released. The new term "Battle of Hundreds of Models" born last year is still applicable today, but the outbreak of price wars and the needs of the application side have actually become clear. Retired most of the models that had no need to be born.

The first trend is the end-side model with small parameters. The classification of medium cups, large cups, and extra large cups can no longer meet the different needs of various scenarios. Large-parameter models have powerful capabilities, but their training and calling costs are high, and they are difficult to popularize when hardware capabilities are limited.

The emergence of end-to-end models makes it no longer far away for simple AI applications to enter daily life. The most typical cases are mobile phone/PC end-to-end models, such as Xiaomi’s MiLM, vivo’s Blue Star large model, etc. They not only retain key capabilities on mobile phones, but also reduce resource consumption. The deployment of such models has become a key step for AI to penetrate into daily life.

On this basis, another major trend is the application of hybrid expert (MoE) technology, a solution that makes model calling cheaper but still efficient. Ordinary large models are like an omniscient expert who knows everything, but they are expensive (high computing power requirements). The MoE model is like inviting a team of experts who specialize in different fields. Users can mobilize the corresponding experts when needed. Through this mechanism, the computing power requirements and costs of the model are greatly reduced. Taking Mixtral-8x7B as an example, it is not far behind GPT-4 in terms of performance, but its resource requirements are much lower.

In addition, multi-modal research has also begun to become an important direction for the development of large models in 2024. Human cognition of the world is achieved through multi-modal methods such as vision, sound, and touch. If a large model wants to be truly intelligent and trulyIt is obviously not enough to rely solely on text input and output to achieve application value. Taking the generation of accompanying pictures as an example, AI not only needs to understand the text content, but also grasp the context of the image. With the release of Google's native multi-modal large model Gemini, multi-modal capabilities have become the focus of research by major AI companies.

For ordinary users, there is no specific criterion for judging the quality of the answer content of a large model, but the more content a large model can read, the stronger it will be. In March of this year, Kimi from The Dark Side of the Moon chose the path of writing "extremely long texts". Originally, we wanted the large model to read a book or a long article, which required the use of various prompt words, but kimi directly improved the reading ability of the large model to be able to handle 2 million context lengths, which is equivalent to 3 books of " "A Dream of Red Mansions". Subsequently, kimi's influence in the country soared. Even the mineral water brand that receives users of Dark Side of the Moon was picked up and hyped, creating the so-called "kimi concept stock."

The real “explosion point” of the large model industry will occur in May 2024. Deepseek has launched a price war. Big manufacturers such as Byte and Alibaba have followed suit with price cuts, and Baidu and iFlytek have launched free models. At the technical level, such as model compression, mixed precision training, etc., help manufacturers reduce training and calling costs, thus providing room for price adjustments. At the market level, this price war is undoubtedly imitating the business model of the Internet era, rapidly expanding market share by lowering prices. At the same time, manufacturers also improve the training effect of models by obtaining more user data.

After this round of price war, the ecology of the domestic large model industry has been reshaped. Many small and medium-sized enterprises were forced to withdraw from the market, while the remaining large manufacturers dominated the market through price advantages.

But the technological progress of leading large models has not stopped. In September 2024, OpenAI released GPT-o1. This model significantly improved its reasoning capabilities through reinforcement learning and thinking chain technology, especially in mathematics. problems, programming tasks, and scientific reasoning. Domestic Kimi and Zhipu also released similar products almost simultaneously, and reasoning capabilities became the focus of large model research in the second half of the year.

Although the current large models are easy to use and cheap, they are not as good as the dream GPT-5. At the end of 2024, the most watched large model news should be that GPT-5 is difficult to produce. According to a report by the Wall Street Journal on December 20, the development of OpenAI’s GPT-5 project has continued for more than 18 months. The project should be completed in mid-2024, but the progress is now seriously lagging behind.

One of the reasons is that the training cost is too high. It is estimated that the computing power cost of GPT-5 training may be as high as 500 million U.S. dollars (approximately 3.66 billion yuan). On the other hand, the computing power available for GPT-5 High-quality training data is scarce, so OpenAI had to hire a bunch of people to write training data for GPT5 from scratch.

Hopefully, we can see GPT-5 released in 2025.

02. Put functions into a software

Large model capabilities are the basis, but for users’ daily use, under the influence of the Internet era, everyone expects an APP to solve the problem All problems, the AI era is no exception. Therefore, from the perspective of software, the clearest trend in 2024 is to cram more functions into one software.

AI search content. Be the King

AI search is regarded as one of the most promising directions in large model applications, and has become the first field to achieve large-scale implementation. The generative large model itself is a content library, and its Training requires a lot of data. In addition, the most common interaction mode of generative large models is conversational, which is highly consistent with users’ search needs.

In Robin Li’s words: “Generative AI and search are a natural match. . "In this context, AI search has become the focus of the industry, especially Perplexity, which has AI search as its core function. Its valuation continues to reach new highs, attracting the active deployment of technology giants such as OpenAI and Google, setting off a boom in AI search at the beginning of the year. upsurge .

In the early days of AI search, it was regarded more as an independent product, with its main functions focused on providing search services. Companies such as MiTa AI and Tiangong AI adopted the "traditional search engine +" approach. AI" method. The user enters a question in the input box, and the AI

This model faces high cost challenges, especially for companies without a search engine foundation, who have to invest a lot of resources in building or purchasing URLs. Search library. With traditional search engines such as Baidu and Google Search engines have added AI functions one after another, and AI search startups have gradually lost their competitive advantages.

Faced with this dilemma, Tencent and ByteDance began to seek differentiation by leveraging their unique capabilities. Content ecology (Douyin and official accounts) will AI search is integrated into their own AI assistants. This strategy allows them to take advantage of their existing huge user base and content ecosystem, avoid fierce competition from traditional search engines, and find their own unique positioning.

More like humans. AI voice

Enables artificial intelligence to conduct natural conversations like humans, has always been a key criterion for evaluating its capabilities, and many people dream of having an intelligent assistant similar to Jarvis in "Iron Man"

However, current interaction methods still mainly rely on text. The large audio model actually converts speech into text, which is understood and generated by the large model, and then the generated text is converted into speech output. In this process, some problems unique to text interaction are inevitably encountered. For example, problems such as difficulty in understanding dialects, inaccurate emotion recognition, and the inability of users to effectively interrupt mid-conversation.

After the advanced voice mode of ChatGPT was demonstrated at the OpenAI spring new product launch conference in May, The huge progress in AI voice communication capabilities has attracted widespread attention.

In August this year, Volcano Engine used a press conference to demonstrate Doubao’s AI voice technology that supports functions such as emotional understanding and conversation interruption. In October, Zhipu also launched an end-to-end voice model, focusing on human-machine communication capabilities. Conduct it like a normal conversation.

This breakthrough benefits from the support of BigTTS technology and RTC (real-time communication) technology. BigTTS technology gives AI richer emotions and intonations, making its speech output more vivid and natural; RTC technology greatly reduces the delay in mixed Chinese and English conversations and improves the fluency of real-time interaction. In addition, through Seed-TTS technology, AI can quickly clone the characteristics of sample sounds to achieve more personalized and realistic speech output in the scene.

AI video is becoming a productivity tool

Before OpenAI released AI video technology on February 15, 2024, AI video was still in the experimental and abstract stage, and the ability to achieve background changes like PPT was already considered As a leader, representative AI software and companies include Runway, Pika, etc.

However, the emergence of Sora has greatly raised people's expectations for AI videos. Scaling law has been proven to be effective in the video field, so major companies have begun to invest in AI videos, competing for length, screen movement, and There are more differences in style and image quality.

However, the biggest problem with AI video is commercialization, which is not considered by many to be the reason why Sora has not been released. Coincidentally, the short drama market is booming, and the production of short dramas does not require seamless scenes, which is suitable for AI videos with weak consistency. Douyin Kuaishou began to try the AI + short drama model to promote its own AI video software. Douyin's "Sanxingdui: Future Apocalypse" and Kuaishou's "Mountains and Seas Strange Mirror: Cutting Waves" received 135 million and 52 million views respectively on their respective platforms.

As domestic AI videos become a mess, and various free applications appear frequently, but no AI video software has a clear profit model, in September, celebrity chef Gordon Ramsay The "alchemy" in the kitchen is a joke video generated by Minimax's Conch AI, which has become popular on overseas social platforms.

This phenomenon is called "Chinese applications achieve early victory in the field of AI video" in the title of overseas media. Before Sora was officially launched, AI video software such as Keling, PixVerse, and Vido were wildly seizing overseas markets. Start-up companies were opening overseas offices one after another. Talkie, a subsidiary of MiniMax, had 11 million monthly active users worldwide.

The key to commercializing AI video is to sell the software, especially when the "best" AI video sora is still in the drawing stage. Compared with the domestic market where the willingness to pay is not strong, overseas users have better paying habits and the market space is obviously larger.

In order to make AI videos more productive, rather than just living videos. The emergence of AI video short drama platform products has further reduced theThe production threshold of AI short drama. These platforms integrate script creation, storyboard design, video generation and other steps required for short drama production into one application, greatly simplifying the creative process. For example, in August, Kunlun Wanwei released SkyReels, the world’s first AI short drama platform that integrates video large models and 3D large models, allowing creators to “make dramas with one click.”

AI Agent with lower threshold

The AI video, AI voice, AI search and other functions mentioned above can all be summarized as AI Agent. In short, AI Agent is an agent driven by artificial intelligence that can complete various tasks on behalf of humans. In March 2023, the release of the AutoGPT framework project set off a wave of AI Agents. Subsequently, similar products such as Baby AGI and AgentGPT emerged one after another.

However, due to the high development threshold, the number of users is relatively limited. In 2024, the trend of AI Agent will gradually focus on lowering the development threshold and promoting the popularization of technology.

At the Create conference in April this year, Baidu released the AI development tool AgentBuilder and the AI native application development tool AppBuild, focusing on developing AI by talking. In December, ByteDance’s AI development platform Button also frequently updated low-code development software, such as Project IDE, UI Builder, etc.

Although many users have achieved "developer" status through AI Agent, "how to make AI developers make money" has become the latest problem for major manufacturers.

Most of the current AI Agents can only be used for a single task, such as programming, editing pictures, writing articles, etc. Another major development trend is to transform AI Agents from single tasks to widespread use. In November, Zhipu AI released its blockbuster product - AutoGLM. This AI Agent can truly help users complete various tasks automatically. At the press conference, Zhang Peng, CEO of Zhipu AI, distributed 20,000 yuan in red envelopes to on-site personnel through AutoGLM, and declared that this was "the first time in history that AI has sent red envelopes to humans."

03. AI hardware is no longer All in AI

If "Follow OpenAI" is the main line of the AGI track, so in the field of AI hardware without a main line, the entire market seems to be blooming. At the CES show at the beginning of the year, the slogan of AI subverting everything was shouted, and the Rabbit R1 and AI were launched. Pin set off a wave of native AI hardware, but as a result, all the first-generation native AI hardware overturned.

Then "all things + AI" began to rise. AI learning machines, AI headphones, AI toys, and even AI mice and computer massage chairs emerged in endlessly. However, in addition to price, for the time being,No subversion was found.

In addition, the concept of AI hardware allows traditional PC OEMs to see opportunities. Originally, the iterative upgrades of PC OEMs were limited by the replacement of CPU+GPU, and the autonomy was small and profits were firmly locked. However, the concept of AI PC perfectly corresponds to the idea of "software-defined hardware".

However, consumers are gradually discovering that AI PCs are not much different from traditional PCs. The NPU performance of AI PC is not enough to support running efficient local models, and the Internet still needs to be connected to run large models. No matter how much the computing power of AI PC is boasted, in actual experience, buying a graphics card is more practical.

At the end of the year, AI glasses suddenly emerged. The AI glasses launched by Ray-Ban and Meta sold 1 million units in a short period of time and quickly detonated the market, becoming the AI hardware track with the most promising capital and the hottest investment and financing.

Technically, these glasses do not have any disruptive breakthroughs, but the reason why they are so popular is that they are a good pair of glasses. Meta cooperated with Ray-Ban and chose the classic Wayfarer style, which has a fashionable and elegant appearance. In addition, in terms of weight, Meta has not compromised on the technological features. Official data shows that it weighs only 48 grams, which is similar to ordinary glasses and is extremely comfortable to wear.

Lu Yong, Vice President of Interstellar Meizu, believes that the core elements of smart glasses are firstly good appearance, and secondly lightness. AI functions are not strictly needed. 70% of the time users use glasses are listening to music and taking photos. The popularity of Meta AI glasses has also pointed out the direction for the future development of all AI hardware: before pursuing AI innovation, we must first complete the basic functions and then further integrate AI technology.

04. AI becomes a part of the game from a tool

AI game It should belong to the field of AI software, but if we take "generating the world and simulating the universe" as the ultimate goal of artificial intelligence, then the closest thing to these visions at present is AI games. Moreover, the computing power required for AI has been supported by the game industry long ago, and AI's capabilities were initially verified through games such as chess, Go and "Dota".

By 2024, AI will no longer be just a tool to assist design, but will become a part of the game itself.

Perhaps you still remember "Coax Simulator", which suddenly became popular all over the Internet at the beginning of the year. This game is centered on AI dialogue, and its success has led to the emergence of AI dialogue games. Games make players happy, but everyone has unique hobbies. Game companies often prepare multiple side plots for players to choose from, but the number is limited. And large models can truly represent thousands of people, giving players a unique experience.

Some more mature AI games such as "Turtle Mushroom Soup" and "One Thousand and One Nights". In Turtle and Mushroom Soup, no matter what the player saysRegardless of the content, AI will guide the plot to the main line. In "One Thousand and One Nights", AI can generate any weapon the player can imagine.

A game that is more free and customized than AI-driven is that all game content is generated by AI.

In early November, the first real-time generated AI game sparked a craze in the industry. Two startups, Decart and Etched, announced that they have jointly developed the world's first real-time, playable and interactive world model - Oasis. Trained through millions of hours of gaming videos, Oasis is able to instantly generate open-world games based on user keyboard input.

In December, Google DeepMind released Genie 2, a large-scale basic world model. Users only need to provide an image, and Genie 2 can generate a corresponding virtual world and allow users to interact in it through mouse and keyboard.

Although Google's technology is undoubtedly powerful, the most popular AI game in 2024 is undoubtedly "Shawarma Legend" - a true "AAA masterpiece" - integrating AI mapping , AI dubbing, AI composition and other technologies.

This game suddenly became popular at the end of September, quickly surpassing popular games such as "Honor of Kings" and firmly ranking first on the iOS free list, where it has dominated the list for at least 16 consecutive days. The content of the game is very simple. Players play the role of the owner of a shawarma restaurant and need to cut meat, french fries, add sauce, wraps, etc. to meet the needs of different diners.

Compared with traditional 3A games, "Shawarma Legend" is relatively simple in terms of picture quality and music, but its charm lies in its fun. Just like AI hardware, the success of games also reminds developers that AI games are still essentially games, and the most important thing is that they are fun.

From basic large models to breakthroughs close to AGI, from AI abstract videos to the explosion of AI-generated short dramas, from more "realistic" digital humans to new ways of playing AI games... all these changes, What once took years of accumulation now happens in just one year. The speed of technological progress is constantly refreshing our knowledge.

As another industrial revolution after the Internet, cloud computing, and smartphones, every Chinese person extremely hopes to take the lead in this wave of artificial intelligence, rather than once again being labeled as "catching up." "the name of the person".

It is true that when talking about AI, we always talk about the poor domestic environment, poor financing sentiment, technology gaps, and unsatisfactory commercialization. But looking back at this round of AGI revolution, There is no news from Japan and South Korea, which once led the electronics industry, and Mistral is the only one left in Europe. They still returned to their hometown from Silicon Valley to start a business.

Looking at China, in terms of the scale and quality of its AI talent pool, China is second only to the United States. In the year before ChatGPT came out, the number of Chinese AI papers was twice that of the United States.

Kevin Kelly once asked a question at the 2024 Shanghai Bund ConferenceQuestion: "Imagine the world 100 years from now. What kind of environment would you like to live in?" But in this era of rapid change, the future one year from now is unpredictable.

Online Consultation