In 2024, the first echelon of startup companies that have settled down from the hustle and bustle of the "Battle of 100 Models" will probably be a "6+2" pattern.
"6" stands for Zhipu, MiniMax, Baichuan Intelligence, Dark Side of the Moon, Stepping Stars and Zero-One Everything. The industry is also often called the "Big Model Six Little Tigers"; "2" means two A slightly smaller but distinctive company: DeepSeek and wall-facing intelligence.
Not long after ChatGPT was released, Big Model Liu Xiaohu was the brightest star on the entrepreneurial track. But in the second half of this year, the situation is quietly changing.
Recently, several investors have told "Jiazi Guangnian" that there are two little tigers that have a tendency to fall behind, and each investor gave exactly the same two names. "But I haven't said which one is doing particularly well. Let's give it a try next year. Based on the current valuation, no matter which one is lower than expected." One of the investors added.
Another investor said that the comprehensive capabilities of the six companies are roughly at the same level, but the scale of financing is different.
In addition, DeepSeek under Magic Square Quantitative has become a blockbuster this year. The recently released DeepSeek V3 has surpassed Alibaba Qwen-2.5 and Meta Llama 3.1 in multiple evaluation results, becoming the new king of open source models. Some people believe that DeepSeek has actually become one of the "Six Little Tigers".
The competition among the Six Little Tigers is also expanding infinitely. The technology giants that were late in 2023 will catch up in 2024. Overseas, Google has just issued a military order to sound the clarion call for counterattack in the critical year of 2025. Domestically, ByteDance is also taking a comprehensive approach to its AI strategy.
How will large model startups respond in 2025?
1. Where the AGI dream beginsChange the time scale slightly By stretching it, you can feel the undercurrent more intuitively. Let’s first review the starting point of the large model Six Little Tigers.
Zhipu and MiniMax are the only two companies that were established earlier than the release of ChatGPT. They saw the turning point of technology earlier than most people.
Zhipu was established in June 2019. It is one of the first companies in China and may be the first to explore large models. On the first day of its establishment, Zhipu wrote its vision of “making machines think like humans.”
The first anniversary of Zhipu coincides with the release of GPT-3 by OpenAI. On the same day, Zhang Peng and the invited Academician Zhang Bo had an in-depth discussion on the technical prospects of GPT-3. Zhang Peng was vaguely aware that this technology, called "big model", would be the future technology direction. Zhang Peng said: "What OpenAI does is what we have been looking forward to doing, and we must pursue it, and we must go for it."Made. ”
MiniMax was established two years after Zhipu was established. In 2021, Yan Junjie wrote down the original intention and path of MiniMax in a room of less than 100 square meters, to realize “Intelligence with "Everyone". The three judgments Yan Junjie wrote at the time have remained unchanged to this day: to build the next generation of AI; to build an agent close to the Turing test; to create the ultimate experience with intelligence.
Yan Junjie spoke at the DingTalk Ecosystem Conference Sharing the Day1 of MiniMax online, the picture is taken by "Jiazi Guangnian"
After the release of ChatGPT, large models quickly transformed from a cold and obscure technical vocabulary to the hottest topic in the investment and financing market. Many people cannot understand the concept of large models, but this does not prevent it from being a consensus in technological development. Carrying the expectation of becoming "China's OpenAI", the large-scale model Six Little Tigers came into being.
In an open letter announcing the establishment of Baichuan Intelligence, Wang Xiaochuan said excitedly: "We are so lucky to live in the early 21st century. The magnificent Internet revolution has not yet come to an end, and the era of general artificial intelligence is coming. Many years ago, I asserted that when machines master language, the era of general artificial intelligence would come; Think about it, the future of search is question and answer. With the emergence of ChatGPT, everything has begun to become a reality. It has only been 131 days since ChatGPT was released, and new developments and breakthroughs are coming every day. !”
Also excited was Yang Zhilin. In his view, the advanced reasoning capabilities demonstrated by ChatGPT were unimaginable three to five years ago. It will generate variables in capital and talent, which are the core production factors of AI, and bring about a possibility: from 0 To 1 build an organization that builds AGI (Artificial General Intelligence). Yang Zhilin named the company after his favorite rock band Pink Floyd's album "Dark Side of the Moon", which also represents the company's exploration spirit of mystery and unknown.
In December 2022, the first question Jiang Daxin asked ChatGPT was: "How old are you?" In the past, this simple question for humans would have stumped all machines, but ChatGPT's answer Yes, it was trained in 2019 and this year is 2022, so it is 3 years old. Jiang Daxin asked again: "How old will you be next year?" The difficulty of this question is to let the machine understand that next year is "this year + 1". When it comes to numerical reasoning, ChatGPT got the answer right again. ChatGPT's answer gave this senior technical expert goosebumps. Jiang Daxin realized that an epoch-making technological change was coming.
Kai-Fu Lee is the oldest among this group of AGI entrepreneurs. The doctoral application letter he submitted to CMU (Carnegie Mellon University) 40 years ago was to explore AI. Kai-Fu Lee could have stood behind the scenes as an investor, but after 40 years, when he finally saw that his dream through AGI had a chance to come true, Kai-Fu LeeAfter all, Kaifu could not restrain his enthusiasm and entered the game. He sent a hero post on March 20, 2023, preparing for Zero One Thousand Things.
CMU PhD application letter from Kai-Fu Lee in 1983
In just 3 months, the large-scale model Six Little Tigers were gradually put in place: Baichuan Intelligent was established in March 2023. Dark Side and Step Stars were established in April 2023, and Zero One Thousand Things was established in May 2023.
Yang Zhilin once explained the company’s establishment time and the urgency of the financing window: “The spread of ChatGPT takes time. Some people know it early, some people know it late, and some people are skeptical at first and then become It turns into shock, and then it turns into belief. Looking for people to find money is closely integrated with timing. We will start focusing on the first round of financing in February 2023. If it is delayed until April, there will be no chance. But if it is done in December 2022 or January 2023, there will be no chance. There was an epidemic at that time and no one responded. Come here - so the real window is one month."
Looking back, Yang Zhilin's judgment was basically completely correct. AI companies established after the second half of 2023 will either find it difficult to obtain large amounts of financing and make it into the first echelon; , Deeply explore the vertical segmentation field of end-side large models, or, like DeepSeek, rely on the "not bad" parent company Magic Square Quantification, you can launch a "deep search" for AGI in a relatively idealistic way.
The six tigers in the large model all have the same ideal - AGI. But how to implement AGI, different companies gradually form strategic differentiation, which will gradually become obvious in 2024.
2. Strategic differentiation of the "Six Little Tigers"6, 2024 At the Zhiyuan Conference on January 14, four founders of "Tsinghua-based" large model unicorns Yang Zhilin, Wang Xiaochuan, Zhang Peng, and Li Dahai rarely shared the stage and shared their respective views on the "road to AGI" . Under the auspices of Wang Zhongyuan, President of Zhiyuan Research Institute, the several founders had almost no direct confrontation, but this was still a rare time to be on the same stage.
Picture from Zhiyuan Conference
AGI is a common goal for everyone, but there are a thousand AGIs among a thousand practitioners, and there are different routes on how to realize AGI.
In order to change AGI from a qualitative description to a quantitative description, DeepMind, OpenAI and Wisdom have successively defined the levels of AGI.
In the definition of wisdom spectrum, L1 represents the AI’s ability to learn to use language, L2 represents the AI’s ability to think logically and multi-modal understanding, L3 represents the AI’s ability to learn to use tools (Agents), and L4 represents the self. The ability to learn—that is, the internationally popular “super alignment”, L5 represents the ability to comprehensively surpass humans and explore scientific laws, and has approached AGI.
At the same time, Zhipu also defines the progress bar of the current AGI. They believe that the progress of L1 has reached 80%, the progress of L2 is 60%, o1 is a new paradigm of inference model; the progress of L3 is only 40%, and the agent's ability is still in a very early stage; L4 and L5 have just started. .
AGI-oriented roadmap announced by Zhipu
If you use one word to evaluate Zhipu's strategy, it might be "steady and win-win".
In the past two years, Zhipu has adopted the safest and most certain route, which is to keep a close eye on OpenAI, the best company in the industry, from the underlying pre-training framework to the model to the top layer. Application and comprehensive benchmarking. But the best result of always being a follower can only be the second child. Starting from the second half of 2023, Zhipu has emphasized on many occasions that becoming China’s ChatGPT is far from the company’s goal.
The pace of OpenAI has slowed down this year. In addition to benchmarking OpenAI, Zhipu has increased its investment in L3-Agent. At the press conference in November 2023, Zhang Peng demonstrated on the spot using AutoGLM to create a group on WeChat and sent out 100 red envelopes with a total value of 20,000 yuan in the group. Zhang Peng believes that Agent is like adding an intelligent scheduling layer between users and applications, linking all applications and even all devices. This can be seen as a prototype of the Large Model General Operating System (LM-OS).
Another large model company that clearly targets OpenAI is Step Star. Step Star is the latest company among the Six Little Tigers to make its public appearance. From the first day the company was founded, Step Star has drawn a business line similar to OpenAI on the display board in the company's exhibition hall - from single-modal model to world model.
Jiang Daxin once told "Jiazi Guangnian" that OpenAI's model matrix may seem complicated, but the logic behind it is actually very simple. The mainstream models released by OpenAI include the language generation model GPT-4 series, multi-modal generation models DALL-E and Sora, multi-modal understanding model GPT-4v, end-to-end-to-end speech model GPT-4o, and the latest release Inference model o series. In addition, OpenAI is also actively deploying embodied intelligence, which is one of the core carriers of the world model.
Jiang Daxin believes that the evolution of large models will go through the process of independent development of various modes such as language, video, and voice in the early stage, and then gradual integration to complete integration. Jiang Daxin believes: "Scaling Law, the unity of multi-modal understanding and generation, are the core cognitions for realizing AGI."
This technology roadmap was released by Step Star in March 2024, "Super Q* and System2 in the "Alignment" part were later confirmed to be o-series inference models released by OpenAI
Although MiniMax was established earlier,He has always maintained a low profile externally. In February 2023, MiniMax held a small-scale communication meeting. Yang Bin, one of the core founders, introduced to "Jiazi Guangnian" and others the three basic models developed by MiniMax: text to vision, text to speech, and text to Text, MiniMax is also the first multi-modal large model startup in China. At that time, MiniMax's first application, Glow, had gained nearly 5 million users. Later, the product was renamed Hoshino, and the overseas version was Talkie.
MiniMax began developing the MoE hybrid expert architecture in the summer of 2023, investing 80% of its computing power and R&D resources. After two failures, MiniMax officially launched the country's first large MoE model in January 2024. In April 2024, MiniMax began to delve into Linear Attention and successfully developed a new generation of MoE+Linear Attention-based models, reaching a level comparable to GPT-4o.
At the "MiniMax Link Partner Day" on August 31, 2024, Yan Junjie said that "fast" is the core technology research and development goal of MiniMax's underlying large model. Yan Junjie also shared his three key factors for improving the penetration rate and depth of use of AI applications: continuously reducing the error rate of the model, infinitely long input and output, and multi-modality.
Yan Junjie shares Minimax models and products, pictures from MiniMax
Dark Side of the Moon has not disclosed its technical roadmap, or even announced any basis for its underlying large model information.
In October 2023, Dark Side of the Moon released its first AI smart assistant Kimi, which became a blockbuster with its 200,000-word long contextual input. At that time, Ahthropic's Claude-100k model only supported about 80,000 words, and OpenAI's GPT-4-32k only supported about 25,000 words. The unique long context function has allowed Kimi to gain a large number of users in a short period of time and become the most popular AI smart assistant in China.
The larger the number of users, the higher the cost of inference. How does Kimi catch this "tremendous traffic"? In July 2024, Dark Side of the Moon and the Tsinghua University team released Mooncake, a separated reasoning architecture centered on KVCache. Xu Xinran, Vice President of Dark Side of the Moon Engineering and Head of AI Infra, revealed that this system carries 80% of Kimi’s online traffic.
Mooncake inference system architecture diagram, the picture is from The Dark Side of the Moon
Xu Xinran also released some "violent opinions", the first one is about cost saving: "Now, immediately, You can really save a lot of money right away (after all, the scale and daily request pattern cannot be disclosed, if you say you can’t save it, you are right)”
After Kimi, The Dark Side of the Moon.We are also exploring more technical routes, including the video generation model currently under internal testing and the already released mathematical model k0-math and visual thinking model K1.
The interface of the video generation model under internal testing in Dark Side of the Moon
Wang Xiaochuan’s vision for AGI is the most unexpected, but it is also reasonable. Last year, Baichuan Intelligent completed the establishment of a general artificial intelligence team. This year, it attracted a large number of professionals in the medical field and began to focus on the medical field from the general artificial intelligence strategy.
Focusing on medical care seems to be narrowing the road, but Wang Xiaochuan doesn’t think so. Wang Xiaochuan takes the ability to artificially create doctors as an important indicator of AGI. Wang Xiaochuan believes that the first change in AGI is that it begins to have the ability to think, learn, communicate and empathize, as well as multi-modal image processing capabilities. From the perspective of the ability requirements of the learning paradigm, we are actually evaluating it like a human being. Therefore, our evaluation metric or learning paradigm is to learn from humans, the data comes from data generated by human society, and doctors are one of the most intellectually dense professions of all.
"If even doctors can't make it, then don't talk about AGI." Wang Xiaochuan said firmly.
Zhang Bo, an academician of the Chinese Academy of Sciences and a professor of computer science at Tsinghua University, recently mentioned Baichuan Intelligent when he shared with "Jiazi Guangnian" his optimistic large-model enterprises. He said: "From an enterprise perspective, It (Baichuan Intelligence) may survive , it is working hard to solve China’s medical problems. Now you can only look at the big domestic models from the application perspective.”
Wang Xiaochuan is the only one who has not followed up on Sora like the other Six Little Tigers. The founder who has made it clear that he will not be a video model. When Sora shocked the world in early 2024, it also shocked the engineers of Baichuan Intelligence, but the idea of following Sora was quickly suppressed by Wang Xiaochuan. In Wang Xiaochuan's mind, language is the "bible" of intelligence. Sora represents neither AGI nor scenes, but a staged product.
On the point of not making a video generation model, Wang Xiaochuan and Robin Li, who always have different views from him, have reached a rare consensus. During this year's Baidu World Conference, Robin Li expressed his view on "Jiazi Guangnian" that "we will not do Sora, but we are very optimistic about multi-modality."
Zhang Yijia, founder & CEO of Jiaziguangnian, and Luo Yihang, founder of Silicon Star, talk to Robin Li, the picture comes from Baidu
3. Advances and retreats in pre-training
This year, major overseas technology companies are engaging in a very new type of acquisition - not directly acquiring the company, but acquiring the company's CEO, including a small team. There are three well-known cases: Amazon’s acquisition of Adept, Microsoft’s acquisition of Inflection.ai, and Google’s acquisition of Character.ai. This is seen as a signal of a reshuffle of the large-model entrepreneurial landscape in Silicon Valley.
On August 3, after the news of Google’s acquisition of the Character pre-training team was releasedLater, Yuan Jinhui, founder and CEO of Silicon-Based Flow, commented: "The integration of production and mold has been a bit of a blow."
Product is the product, and mold is the model. In fact, OpenAI, Anthropic and xAI, which later emerged as a new force, are all integrated production and model routes. Therefore, the problem is not that the route of integrating production and molding is problematic, but whether non-leading startups have enough funds to support the integration of production and molding.
Pre-training the model requires high costs. Meta trained Llama 3.1 using 16,000 H100s. The world's largest AI training cluster Colossus, which Musk built in 122 days this year, contains 100,000 H100s. If you only calculate the GPU purchase cost, assuming that an H100 is US$30,000, the training cost of Llama 3.1 is as high as US$480 million, and the training cost of xAI is as high as US$3 billion. This is something that domestic startups cannot afford.
According to public information, the valuation of the large model Six Little Tigers is around 20 billion yuan, and the financing amount is around 10 billion yuan. Is the money enough to continue pre-training?
In October this year, a report from 36kr stated that at least two young tigers in China would give up pre-training. However, no company has admitted this yet.
Lee Kai-fu specifically issued an article to refute the rumors. Zero One Wish also released a new model Yi-Lightning on October 16. This model once ranked sixth in the world on the recognized and authoritative ranking list LLM Arena. Second only to OpenAI and Google, tied with xAI.
Zhang Peng said at the 2024 Jiazi Gravity Year-end Ceremony on December 10 that Zhipu has been doing pre-training and just released the latest iteration of the model GLM-4-Plus in August this year. Zhipu will iterate a new version of the model in about 4 to 6 months.
Wang Xiaochuan also talked about this matter at the Geek Park IF2025 Innovation Conference on December 14. He first affirmed that China must master pre-training on its own. But at the same time, he also said that due to limitations of cards and computing power, it is unrealistic to do pre-training for super platforms in China. Baichuan chose "scenario-driven pre-training" and must lead the progress of the model when doing super applications.
After completing hundreds of millions of dollars in Series B financing on December 23, Step Star stated that this financing will be used to continue investing in basic model research and development, strengthen multi-modal and complex reasoning capabilities, and use products and The ecosystem increases coverage of C-end application scenarios.
The Six Little Tigers did not give up pre-training, which may have something to do with the slowdown of Scaling Law in pre-training. OpenAI has been slow to release the next generation of pre-trained models, but has shifted its focus to inference models, which makes people wonder whether the pre-trained Scaling Law has hit a wall. Ilya, the former chief scientist of OpenAI, directly made a "violent statement" at this year's NeurIPS 2024 conference: "The pre-training we know will beIt’s over (Pre-training as we know it will end). ”
Ilya speaking at the NeurIPS 2024 conference, picture from X
From another perspective, if pre-trained Scaling Law is really slowing down, which may not be a good thing for the realization of AGI, but it is not necessarily a bad thing for domestic large model companies that lack cores and cards. When the scale effect brought by stacking cards decreases, the value of engineering will be lost. Enlarge.
Yi-Lightning of Zero One Thing is a reference case. -Lightning's training process only uses 2,000 GPUs and costs 3 million US dollars. The training cost is only about 3% of OpenAI. Kaifu Li believes that as long as China's large model companies have enough good talents and the determination to do pre-training, the amount of financing will be reduced. It won't be a problem with the chip.
In the near future, DeepSeek. The release of V3 also confirms Kai-fu Lee's view. This new open source king has surpassed Qwen-2.5-72B and Llama-3.1-405B in multiple evaluation results, as well as the world's top closed-source models GPT-4o and Claude-3.5. -Sonnet is neck and neck. V3 only uses 2048 H800s for training, with a total training cost of less than 6 million US dollars, and its GPU usage hours are only one-tenth of Meta.
In addition to pre-training, the paradigm of large models is also great. Transfer. After OpenAI released the o1 inference model, OpenAI research scientist Noam Brown said that the o1 model represents a new extended paradigm of inference, and we are no longer limited by the bottleneck of pre-training.
When inference also has Scaling. Law, the front line of the big model Six Little Tigers has been stretched again. One of the most direct questions is how to distribute the computing power of pre-training and inference.
Zhang Peng revealed at the 2024 Jiazi Gravity Year-end Ceremony that Zhi Spectrum for pre-training and The computing power input for reasoning is about half and half.
Zhang Peng, CEO of Zhipu, picture from the 2024 Jiazi Gravity Year-end Ceremony
At the 2024 Yunqi Conference, Jiang Daxin and Yang Zhilin talked about it. o1 The model had a conversation. Yang Zhilin believes that the proportion of computing power between training and inference will change. Pre-training will not necessarily decrease, but inference will definitely increase. In the past, only companies that reached a certain computing power threshold could do pre-training. Algorithm innovation, and now companies with relatively little computing power can explore more opportunities through post-training.
Jiang Daxin believes that the computing power requirements for reinforcement learning training on the inference side are not necessarily higher than pre-training. Training should be small because Self Play (self-reinforcement) has no upper limit in theory, but I am not sure whether the main model of Self Play should continue to scale and whether the ROI is positive.. If so, the demand for computing power will increase quadratically.
At present, Dark Side of the Moon has taken the lead in releasing inference models, and has successively launched the mathematical model k0-math and the visual thinking model K1. It is the fastest startup company to follow up on inference models. In addition, Alibaba’s QwQ, DeepSeek’s R1, and Kunlun Tiangong’s Skywork o1 also quickly followed suit. And today, the last day of 2024, Zhipu also released its first inference model GLM-Zero-Preview based on extended reinforcement learning technology training.
The outcome of pre-training has not yet been decided, and the inference model battle is in full swing. The second half of the big model has quietly begun, testing each company's strategic execution capabilities.
4. Difficult commercializationCompared with differences in technical routes, commercialization is even more urgent and life-or-death. Sequoia Capital once raised the famous "AI's $600 billion problem", which pointed to the serious mismatch between AI commercialization revenue and huge investment.
Lee Kai-fu once defined the market opportunities for China's large models in this way: "If open source and closed source are divided into two types, domestic and foreign are divided into two types, to B and to C are divided into two types, then there will be at least 2x2x2 =8 opportunities, and the number of winners may be further narrowed. ”
Open source and closed source are not only technical choices, but also business choices. This year, Robin Li said that "open source is an IQ tax", making this topic controversial. Overseas large-model startups such as OpenAI and Anthropic have adopted a closed-source strategy, while Meta firmly adopts an open-source strategy. Google uses both, but mainly close-source.
Only Alibaba Qwen and DeepSeek in China have fully adopted an open source route similar to Meta, while other open source companies mainly adopt the Google route, that is, open source models with smaller parameters or non-latest versions, and larger parameters , stronger models adopt closed source, including Zhipu, Baichuan Intelligence and Zero-One Everything is like this.
Zhang Peng once explained the significance of the open source model: first, open source allows everyone to know what Zhipu is doing; second, open source can allow more people to participate in large models and bring everyone together in a community way To promote the development of large models with enthusiasm, this is the most important thing about open source. Open source is not about winning the market or pursuing commercial interests, otherwise we would not choose open source. This is Zhipu’s long-standing positioning of open source.
Zhipu, the earliest open source company, reaped the dividends of first-mover advantage and is the most successful start-up company with open source strategy. According to data released by Zhipu, its open source model series has been downloaded more than 30 million times globally, and it has been selected as the most popular artificial intelligence institution on the Hugging Face platform.
Baichuan Intelligence announced in September 2023 that the total number of downloads of its open source model in the open source community was close to 5 million times. Zero One Wan has not released data on the number of open source model downloads.On Hugging Face, the cumulative number of downloads of the Zero One Thousand Things open source model is around 200,000.
In terms of commercialization strategy, whether to B or to C is not entirely a single-choice question. Six Little Tigers basically has a comprehensive layout. In the minds of most investors, the imagination space for to B is relatively low, but the commercialization path is clearer; the imagination space for to C is huge, and it is the biggest opportunity after the mobile Internet era, but how to do it and whether it is an opportunity for startups? The jury is still out.
In the C-side field, Dark Side of the Moon and MiniMax are currently the two strongest product companies. There is data to prove it: According to Sensor Tower data, as of June 2024, Talkie’s global monthly active users have reached 11 million, more than half of the users are from the United States, and are head-to-head with Character.ai; in November 2024, Yang Zhilin announced that Kimi is In October, the monthly active users of all platforms exceeded 36 million, directly competing with the popular Doubao in China.
Step Stars and Wisdom Spectrum are also eager to try on the C side, each with its own merits. Yuewen's multi-modal intelligent visual search function "Photo Ask" is the first large-scale application product capability in China integrated into the iPhone 16 camera control keys; while Zhipu has introduced the former senior technical expert of Alibaba Damo Academy and Alipay China Chief Data Officer Hu Yunhua serves as the person in charge of Zhipu Qingyan. According to Zhipu, Zhipu Qingyan will have 25 million users in 2024, with annualized revenue (ARR) exceeding 10 million yuan.
Zero Yiwu focuses on C-end products overseas. Kai-Fu Lee previously revealed that the total number of users of its overseas productivity tool applications is close to 10 million, and its revenue this year is expected to exceed 100 million yuan.
In the B-side field, large models cannot avoid the problem of customization. The reason why it is not easy to make money with a customized model is because when products and services are not standardized enough, it will become a business model based on people/days.
Nowadays, Liu Xiaohu is generally using the "MaaS (Model as a Service) open platform to find solutions. Each Liu Xiaohu has its own MaaS open platform, which provides API interfaces to the outside world to call models. At present, in addition to Kimi, which only provides standardized APIs, several other large model companies are more or less deeply involved in the industry and provide more personalized industry solutions.
Zhipu once shared three of them. Kind of to B’s commercialization model: first, standardized API; second, cloud privatization solution; third, completely privatized solution, which is also the most Chinese-specific solution.
Wisdom Spectrum Positioning Base. A large-scale model rather than a vertical model for specific industries. Zhang Peng once told "Jiazi Guangnian": "We will not directly dive into specific scenarios for application development. Many industries have technical and data barriers, which cannot be solved by the size of startups. We hope more for partners in vertical industries. Deep cultivation.”
But there are alsoXiaohu chose to continue to delve deeper and build industry models and vertical models. In the first half of 2024, Lingyiwan said that it would "resolutely do to C and not do unprofitable to B". In the second half of the year, it released a digital human solution for e-commerce live broadcasts and an AI Infra solution for intelligent computing centers. In addition to the core medical field, Baichuan Intelligent released the "Baichuan4-Finance, a large-scale enhanced financial model in the full link field".
Qi Ruifeng, co-founder of Lingyiwuwu, once told "Jiazi Guangnian" and others that the key to solving the problem of making money in to B is to return to the business itself and truly allow large models to enter the customer's core business scenarios to form A standardized, scalable application product.
People often compare today's "Six Little Tigers of Large Models" with the previous "Four Little Dragons of AI". The latter did not solve the problem of to B customization. If the commercialization problem can be solved, the Four Little Dragons may be the lower limit of the Six Little Tigers; if the commercialization problem cannot be solved, the Four Little Dragons may be the upper limit of the Six Little Tigers.
5. More uncertaintiesIn addition to technology route selection and commercialization progress, there are many factors affecting the Model Company Progress.
In 2024, the most gossipy, controversial, and attention-grabbing large model company may be none other than The Dark Side of the Moon.
In February 2024, Dark Side of the Moon completed a US$1 billion financing led by Alibaba, setting a record for the largest single financing amount in the domestic large model field.
In March 2024, after the release of the 2 million-word long text function, Dark Side of the Moon was hyped by the secondary market, and the "Kimi Concept Stock" was born, which is the name of Kimi Smart Assistant and Dark Side of the Moon Company. Brings huge exposure.
Soon after these two pieces of good news, The Dark Side of the Moon was involved in a huge whirlpool of public opinion.
First the media reported that "Yang Zhilin cashed out tens of millions of dollars", and then Zhu Xiaohu filed a lawsuit, and Dark Side of the Moon entrusted lawyers to handle it. This young entrepreneurial star team has reached the forefront of the world in terms of technology, but its immaturity and deficiencies in the company's management and operations have been exposed.
During this storm, the core team of Dark Side of the Moon is very stable. Other Xiaohu companies have more or less suffered the loss of core personnel, ranging from the person in charge of the core business to the co-founder level. Some of them started a new business, and some joined more powerful companies.
The flow of talents itself is a normal phenomenon, but the whereabouts of talents reflect the flow trend of market resources. In 2024, the siphon effect of big companies with money and cards on talents will become increasingly obvious.
The siphon effect of large manufacturers is not only talent, but also direct business competition. The most representative one is ByteDance, which has undergone a 180-degree transformation in just one year. Kunlun Wanwei founder Zhou Yahui commented on WeChat Moments on November 28 this year: “At the beginning of the year, we talked about Byte’s 23-year AI strategy.Failed, but this does not affect the full score performance of Byte's 24-year AI strategy at all. ”
The picture comes from Zhou Yahui’s circle of friends
On the technical side of the basic model, ByteDance has filled the technology tree of the model. Doubao Model announced yesterday that it will be 5 years old Only 7 months after its debut on May 15th, its model capabilities in general language, video generation, voice dialogue, visual understanding and other aspects have entered the first echelon in the world, and its comprehensive capabilities are benchmarked against functions such as GPT-4o in context. At this point, the Doubao large model has achieved long text capabilities of 3 million words, and the processing delay per million tokens is only 15 seconds. This context window length and latency level have reached the current industry limit.
On the AI product side, ByteDance has not only launched more than a dozen AI applications in the past year, but also has a natural advantage in the competition for advertising. ByteDance has mastered the largest aggregated advertising platform in China. "Pangolin" also has huge traffic super apps such as Douyin. Starting from April this year, Douyin will no longer accept the launch of other AI products.
On the B side, large models are also gradually being used. Factory-led. In May 2024, the Volcano engine launched a price war, and the entire industry quickly followed up, with large model A The price of PI has dropped again and again.
This year, more and more large manufacturers have appeared in large-scale model bidding. According to a statistics in the first half of this year, China Telecom, iFlytek, Zhipu, and Baidu. Cloud and China Mobile respectively took the top five places in large model projects. In the hidden corners where these large models are implemented, big manufacturers are replacing large model startups and taking the dominant position.
In 2024, the rapid development of "AI day, human year" is slowly fading, and the highly anticipated large model companies are gradually slowing down.
There is always a pattern in the development of technology, that is, people tend to If we still target AGI, it is still in its infancy.
In 2025, there will be more stories and stories about the big model. More changes. At that time, we will also be able to see more clearly what AGI looks like.