Luo Fuli, a "genius born after 1995" who was personally recruited by Xiaomi by Lei Jun and a former DeepSeek model trainer, revealed the tip of the iceberg of DeepSeek's talent profile: young, outstanding fresh graduates .
It is this group of "unfathomable wizards" (evaluated by Jack Clark, former policy director of OpenAI) who spent only 6 million US dollars to train the model DeepSeek-V3 with performance surpassing GPT-4o and Claude 3.5 Sonnet. .
DeepSeek founder Liang Wenfeng once gave a general portrait of this group of employees in an interview with 36 Krypton: "They are all fresh graduates from top universities and interns with PhDs and PhDs who have not graduated. , and there are also some young people who have only graduated a few years ago.”
But just forming a talented team is not enough to realize DeepSeek’s AGI ideal.
Through interviews with many relevant people, "Smart Emergence" found that if DeepSeek wants to make good use of this group of young geniuses, it cannot do without the team's management style.
At present, with the rapid expansion of team size, many AI companies have to adopt a more efficient vertical management model.
However, since its establishment in May 2023, DeepSeek has kept the team at about 150 people, and adopted an extremely flat culture that downplays ranks to identify research topics and mobilize resources.
And innovation occurs among this group of unproven young geniuses and a company that adopts a non-Internet organizational form.
Hundreds of young talents, no racing, no teamHire It is the selection strategy of most AI companies to recruit veterans with experience in AI technology.
For example, when Wang Xiaochuan joined Baichuan Intelligence, he brought in the old Sogou team from 20 years ago; Jiang Daxin, who was born in Microsoft, also recruited old colleagues from Microsoft Asia Research Institute when he first established Step Star. The list of co-founders of Zero One Thousand Things was even more star-studded at first, including:
Huang Wenhao, who was born in Microsoft Research Asia, was the first research software engineer of Google Brain and the former ByteDance AI platform Pan Xin, the person in charge, and Li Xiangang, the former head of the Shell Group’s Strategy Algorithm Center.
But DeepSeek prefers young people with no work experience.
A headhunter who has worked with DeepSeek told "Intelligent Emergence" that DeepSeek does not require senior technical personnel. "The most work experience is 3-5 years, and those who have worked for more than 8 years are basically Just pass."
For example, three of the core authors of DeepSeekMath, Zhu Qihao, Shao Zhihong, and Peiyi Wang, completed relevant research during their doctoral internship.research work. For another example, V3 research member Dai Damai just received his PhD from Peking University in 2024.
△Daidai. Source: Internet
Without a work resume, DeepSeek’s criteria for measuring the “excellence” of young graduates include not only their institutions but also their competition results. Several third-party partners of DeepSeek said that DeepSeek attaches great importance to competition results and "will not use anything below the basic gold medal."
A DeepSeek member once disclosed his resume on the Internet: he graduated from Peking University and won gold medals in three ACM/ICPC (International Collegiate Programming Contest) competitions. During my undergraduate period, I published a total of 6 papers, two of which were co-authors, and they were basically top-notch papers.
According to "Intelligent Emergence", in 2022, Magic Square Quantitative began to form an AI team for DeepSeek. In May 2023, when DeepSeek was officially established, the team had nearly a hundred engineers.
Today, excluding the infrastructure team in Hangzhou, the Beijing team has hundreds of engineers. The acknowledgment list of the technical report shows that there are 139 engineers involved in DeepSeek V3 research.
A team of hundreds of people, compared with Byte, Baidu and other model troops with thousands of people, seems to be stretched thin in terms of talent scale. However, in the field of AI innovation where "talent density" far outweighs "personnel scale", many people described "Intelligent Emergence" as saying that DeepSeek is an elite team with all employees.
How to manage and retain this group of young talents? On the one hand, it is to throw money and give cards rudely.
Informed sources told "Intelligent Emergence" that DeepSeek's salary level is benchmarked against byte research and development, "based on the byte offer that talents can get, and then increase the price."
At the same time, as long as Liang Wenfeng judges that the technical proposal has potential, DeepSeek will give talents "no limit" to the computing power.
On the other hand, DeepSeek adopts a rather flat and "academic" management style.
The above-mentioned headhunter said that each member of DeepSeek does not lead a team, but is divided into different research groups based on specific goals. There is no fixed division of labor and superior-subordinate relationship between members of the group. "Everyone is responsible for the part they are best at solving. When encountering difficulties, they discuss it together or ask experts from other groups for advice."
Liang Wenfeng once said In an interview with 36Kr, he described this organizational form as “bottom-up” and “natural division of labor.” : "Everyone has their own unique growth experience and comes with their own ideas. There is no need to push them... When an idea shows potential, we will allocate resources from top to bottom."
Within the industry, many entrepreneurs also regard “flat” as an organizational model suitable for innovative businesses. "Equal communication is very important to building a learning organization. Downplaying job status will encourage everyone to speak freely." Wang Huiwen founded AAt the beginning of Light Years Away, I Company once told "Smart Emergence".
OpenAI co-creator Greg Brockman also mentioned that OpenAI’s jobs are not divided into researchers and engineers, and are collectively referred to as "Member of Technical Staff." This means that “junior engineers” in the mainstream sense can also play a leading role in research projects.
A typical result of "natural division of labor" is the key training architecture, MLA, that greatly reduces V3 training costs. Liang Wenfeng mentioned that MLA originally originated from the personal interest of a young researcher. “We formed a team for this purpose and it took several months to get through it.”
At the same time, there is no horse racing within DeepSeek. According to an AI practitioner who has come into contact with the DeepSeek team, this is to prevent the waste of manpower and resources caused by horse racing. "It is also not conducive to the retention and development of talents." The internal friction caused by the horse racing mechanism is too serious for the formation of team consensus."
"To innovate, teams must get rid of inertia"2023 In 2017, several labels for the portraits of top domestic AI talents—academic experts, executives from large companies, and entrepreneurial veterans—all pointed to the same employment standard: these talents need to be verified by workplace standards such as rank and product influence.
But it is obvious that since 2024, the employment standards of the AI industry are changing. More young people who have not yet been tested in the workplace and have just graduated are coming to the stage.
Aditya Ramesh, one of the directors of Sora, once said at the 2024 Intelligent Source Conference that OpenAI’s recruitment strategy is very different from other organizations. “We pay more attention to those who have high potential but may not have the opportunity to obtain it. A person of formal academic merit”.
Similarly, Xie Saining, author of DiT (Sora Underlying Architecture), also mentioned that there are many very successful researchers who have not really experienced so-called traditional research and formal research training.
△The conversation between Xie Saining and Aditya Ramesh at the Intellectual Property Conference. Picture source: Zhiyuan
Similar recruitment concepts are also reflected in DeepSeek’s selection strategy. Many of the young people who join DeepSeek have no relevant experience in model training, and are not even computer majors.
A DeepSeek member who graduated from physics major once publicly mentioned that he taught himself computer by chance. “Because the work was too cutting-edge, there was almost no