The DeepSeek-v3 large model turned out to be an open source model that exceeded Llama 3 with 1/11 computing power, shocking the entire AI circle.
Following this, the rumor that "Lei Jun offered an annual salary of tens of millions to poach DeepSeek researcher Luo Fuli" also caused people to focus on DeepSeek's talents.
Now not only the technology circle, but also the entire Internet are curious. Someone even posted a message on Xiaohongshu asking, what kind of team is this?
Internationally, some people have translated the interview of founder Liang Wenfeng into English and added annotations, trying to find clues to the rise of this company.
Qubits compiled various information and found that the biggest feature of the DeepSeek team is its youth.
Fresh graduates and current students, especially those from northern Qing Dynasty, are very active among them.
Some of them will be doing research at DeepSeek in 2024, while their fresh and exciting doctoral dissertations have just won awards.
Some of them participated in the entire process from DeepSeek LLM v1 to DeepSeek-v3, and some just practiced for a period of time and made Important results.
Almost all those who proposed key innovations such as MLA new attention and GRPO reinforcement learning alignment algorithms for DeepSeek are young people.
DeepSeek core members revealedDeepSeek released in May 2024 -V2 is a key link that caused this large model company to break out of the circle.
The most important innovation is the proposal of a new type of attention. Based on the Transformer architecture, MLA (Multi-head Latent Attention) is used to replace the traditional multi-head attention, which greatly reduces the amount of calculation. and reasoning memory.
Among the contributors, Gao Huazuo and Zeng Wangding made key innovations in the MLA architecture.
Gao Huazuo is very low-key. At present, he only knows that he graduated from the Department of Physics of Peking University.
In addition, this name can also be seen in the patent information of Step Star, one of the "Six Big Model Entrepreneurs". It is not yet sure whether it is the same person.
Zeng Wangding is from Beijing University of Posts and Telecommunications, and his graduate tutor is Zhang Honggang, director of the Artificial Intelligence and Internet Search Teaching and Research Center of Beijing University of Posts and Telecommunications.
DeepSeek-V2 work also involves another key result-GRPO.
Three months before the release of DeepSeek-V2, DeepSeek-Math came out, in which GRPO (Group Relative Policy Optimization) was proposed.imization).
GRPO is a variant of PPO RL algorithm. It abandons the critic model and estimates the baseline from the group score, which significantly reduces the need for training resources.
GRPO has received widespread attention in the industry. The technical report of another large domestic open source model, Alibaba Qwen 2.5, also revealed the use of GRPO.
DeepSeekMath has three core authors who completed their work during their internship at DeepSeek.
One of the core authors, Shao Zhihong, is a doctoral student in Tsinghua Interactive Artificial Intelligence (CoAI) research group, studying under Professor Huang Minlie.
His research areas include natural language processing and deep learning. He is particularly interested in how to build a robust and scalable AI system that can utilize diverse skills to integrate heterogeneous information and Can accurately answer various complex natural language questions.
Shao Zhihong also worked at Microsoft Research before.
After DeepSeekMath, he also participated in DeepSeek-Prover, DeepSeek-Coder-v2, DeepSeek-R1 and other projects.
Another core author, Zhu Qihao, is a 2024 PhD graduate from the Institute of Software, School of Computer Science, Peking University. Under the guidance of Associate Professor Xiong Yingfei and Professor Zhang Lu, his research direction is deep code learning.
According to the official introduction of Peking University School of Computer Science, Zhu Qihao has published 16 CCF-A papers. He won the ACM SIGSOFT Outstanding Paper Award once and was nominated once on ASE and ESEC/FSE. One paper entered the top three citations at the ESEC/FSE conference in the same year.
In the DeepSeek team, Zhu Qihao also led the development of DeepSeek-Coder-V1 based on his doctoral thesis work.
His doctoral thesis "Language Definition Aware Deep Code Learning Technology and Application" was also selected into the 2024CCF Software Engineering Professional Committee Doctoral Thesis Incentive Program.
△Picture source: Peking University School of Computer Science official account
There is also a core author who is also from Peking University.
Peiyi Wang, a doctoral student at Peking University, is supervised by Professor Sui Zhifang from the Key Laboratory of Computational Linguistics at Peking University.
In addition to the two key breakthrough achievements of DeepSeek-V2 MLA and DeepSeekMath GRPO, it is worth mentioning that there are also some members who have joined it from v1 to v3.
Daimai Dai, one of the representative figures, graduated with a PhD from the Institute of Computational Languages, School of Computer Science, Peking University in 2024. His mentor is also Professor Sui Zhifang.
△Picture source: Peking University School of Computer Science official account
Daidai has made many academic achievements and won the Best Long Paper of EMNLP 2023Award, CCL 2021 Best Chinese Paper Award, and published 20+ academic papers in major top conferences.
A total of 10 doctoral dissertations from universities in mainland China were selected for the 2024 "Doctoral Thesis Incentive Program" of the Chinese Information Society of China, including his "Mechanism Analysis and Ability of Pre-trained Language Model Knowledge Memory" Enhance key technology research.
And Wang Bingxuan of Peking University Yuanpei College.
Wang Bingxuan is from Yantai, Shandong Province and entered Peking University in 2017.
After graduating with a master's degree, he joined DeepSeek and participated in a series of important work starting from DeepSeek LLM v1.
The representative figure from Tsinghua University is Zhao Chenggang.
Zhao Chenggang was previously a member of the informatics competition class of Hengshui Middle School and the silver medal winner of CCF NOI2016.
After that, Zhao Chenggang entered Tsinghua University and became a formal member of the Tsinghua student supercomputing team in his sophomore year. He won the World University Student Supercomputing Competition three times.
Zhao Chenggang works as a training/inference infrastructure engineer at DeepSeek and has internship experience at NVIDIA.
△Picture source: Tsinghua News Network
What kind of team is DeepSeekThese vivid individuals are enough to arouse people's admiration.
But it is not enough to answer the original question, what kind of team is DeepSeek? What kind of organizational structure is there?
The answer may have to be found in founder Liang Wenfeng.
As early as May 2023, DeepSeek had just announced that it was going to make a larger model. Before the results were released, Liang Wenfeng revealed the recruitment standards in an interview with 36Kr’s “Undercurrent”.
Look at ability, not experience. Our core technical positions are basically filled with fresh graduates and those who have graduated one or two years ago.It can also be seen from the list of contributions to papers published over the next year or so that this is indeed the case. Members who are currently studying for PhDs, fresh graduates, and those who have graduated one or two years ago account for a large part.
Even the team leaders tend to be younger, mainly those who have graduated 4-6 years ago.
For example, Wu Yu, who leads the post-training team of DeepSeek, graduated from Beihang University with a Ph.D. in 2019 and has participated in XiaoIce and Bingpedia projects at Microsoft MSRA.
During his period as a doctor, Wu Ma received joint training from Professor Li Zhoujun of Beihang University and Dr. Zhou Ming, former vice president of MSRA.
Half the same school as him is Guo Daya, jointly trained by Professor Seal of Sun Yat-sen University and Dr. Zhou Ming of MSRA, and graduated with a PhD in 2023.
In July 2024, he joined DeepSeek and was mainly involved in a series of mathematical and code large model work.
Guo Daya also had another deed during her school days. During her undergraduate period, she worked as an intern at MSRA.Xi published two top conference papers in one year. He joked that "on the third day of his enrollment, he completed the graduation requirements for doctoral students at CUHK."
In addition to the younger team members, DeepSeek Prominent characteristics among domestic AI companies: attaching great importance to the cooperation of model algorithms and hardware engineering.
There are a total of 200 authors of the DeepSeek v3 paper, not all of whom are responsible for AI algorithms or data.
There is a group of people who have been participating in DeepSeek LLM from the early v1 to v3. They are more focused on the computing power part and are responsible for optimizing the hardware.
They published the paper "Fire-Flyer AI-HPC" in the name of DeepSeek AI, reducing training costs through software and hardware co-design and solving the shortcomings of traditional supercomputing architecture in AI training needs.
Fire-Flyer is the Firefly 2 Wanka cluster built by Huanfang AI. It uses NVIDIA A100 GPU, but has cost and energy consumption advantages compared to NVIDIA's official DGX-A100 server.
Some people in this team have worked or interned at NVIDIA, some are from Alibaba Cloud, which is also in Hangzhou, and many people were seconded from Huanfang AI or simply transferred to DeepSeek, and participated in every A large model work.
The result of such emphasis on software and hardware collaboration is to train DeepSeek-v3 with higher performance with 1/11 of the computing power of Llama 3 405B.
Finally, we also discovered that there is a special existence in the DeepSeek open source project, which is not language model related work, but related to 3D generation.
This achievement was completed by Sun Jingxiang, a doctoral student at Tsinghua University, in collaboration with his mentor Liu Yebin and DeepSeek members during his internship at DeepSeek.
One such intern who has made important achievements at DeepSeek is Xin Huajian, a logic major at Sun Yat-sen University.
During his internship at DeepSeek, he participated in DeepSeek-Prover, which uses large models to prove mathematical theorems. He is now studying for a PhD at the University of Edinburgh.
After reading these examples and returning to Liang Wenfeng’s interview again, I may be able to better understand the operational structure of this team.
There is no pre-position division of labor, but a natural division of labor. There is no upper limit for the transfer of cards and people. Everyone can call the training cluster at any time, and as long as several people are interested, they can start a project. When an idea shows potential, resources will be allocated from top to bottom.This inevitably reminds people of another force that cannot be ignored in the AI world, and yes, it is OpenAI.
In the same way, experience is not considered in hiring. Undergraduates and dropouts are still recruited as long as they have the ability.
In the same way, new graduates and those born after 2000 can mobilize resources to research Sora from scratch.
The same potential direction, the entire company starts from the top to design layout and resource promotion.
DeepSeek may be the Chinese AI company most similar to OpenAI in terms of organizational structure.
Reference link: [1]https://mp.weixin.qq.com/s/Cajwfve7f-z2Blk9lnD0hA[2]https://mp.weixin.qq.com/s/r9zZaEgqAa_lml_fOEZmjg[3]https: //mp.weixin. qq.com/s/9AV6Qrm_1HAK1V3t1MZXOw[4]https://mp.weixin.qq.com/s/y4QwknL7e2Xcnk19LocR4A[5]https://mp.weixin.qq.com/s/C9sYYQc6e0EAPegLMd_LVQ