News center > News > Headlines > Context
DeepSeek’s secret recipe is Silicon Valley-flavored
Editor
2025-01-08 14:02 3,381

DeepSeek’s secret recipe is Silicon Valley-flavored

Image source: Generated by Unbounded AI

DeepSeek is not a product of "Chinese-style innovation"

Hangzhou, China The artificial intelligence startup DeepSeek has been the inner demon of AI researchers and developers in Silicon Valley in recent times. The large language model DeepSeek-V3 it released in December 2024 is considered to have achieved many impossibilities: an open source model trained with 5.5 million U.S. dollars and 2,000 NVIDIA H800 GPUs (a low-profile version of GPU for the Chinese market), with multiple The evaluation results surpassed top open source models such as Qwen2.5-72B and Llama-3.1-405B, and also compared with GPT-4o and Claude 3.5-Sonnet Such the world's top closed-source models are comparable - and the cost of training the latter is conservatively estimated to be hundreds of millions of dollars and hundreds of thousands of the most powerful NVIDIA H100. You can imagine the shock it caused in the artificial intelligence community - especially in Silicon Valley, where researchers, entrepreneurs, funds, computing power and resources in the field of AI are the most concentrated. Many important figures in the AI ​​field in Silicon Valley have not hesitated to praise DeepSeek, such as OpenAI co-founder Andrej Kaparthy and Scale.ai founder Alexandr Wang. Although OpenAI CEO Sam Altman sent a tweet that seemed to insinuate that DeepSeek plagiarized and borrowed from other advanced achievements (it was quickly replied by "Do you mean to use the Transformer architecture invented by Google?"), the praise DeepSeek received is indeed Broadly and sincerely, especially in the open source community, developers vote with their feet

Andrej. Kaparthy praises Deepseek's technical report as worth reading

Many Chinese regard DeepSeek-V3 as a "light of domestic products" and a paradigm of Chinese innovation. Indeed, China's smart researchers and engineers are very good at "how fast." "Easy to Save" to do big things, and achieve results beyond expectations through innovation and refinement of technical methods under the condition of limited resources (many times we don't want to). DeepSeek - V3 It relies so little on high-performance computing power, treats training and inference as a system, provides many new technical ideas, focuses on using engineering thinking to efficiently solve algorithm and technical problems, and concentrates its efforts on big things. This is indeed a Chinese company , Chinese teams and Chinese researchers are better at it. Alexandr Wang’s experience summarized from DeepSeek is: Americans are resting, and Chinese are struggling, at lower costs and faster.speed and stronger combat power to catch up. It’s interesting that people in the U.S. technology community who are friendly to China—including Musk—often summarize China’s successful experience in some fields as being smart, diligent, and methodical, which is of course okay. But it cannot explain, at least in the field of AI, one problem: China's other large model companies and AI talents are equally smart, diligent and good at method innovation, and many of their technical method innovations are also very successful (DeepSeek's distributed reasoning , the first time I noticed a similar innovation was Mooncake from The Dark Side of the Moon), but why didn’t it trigger such a sensational world-class effect? Of course they may do the same in the future, but at least, why is it DeepSeek this time? It is biased to compare DeepSeek to "Pinduoduo in the AI ​​world". It is also incomplete to think that the secret recipe of DeepSeek is to be faster, more economical. Most AI companies in China are short of cards, and they are all desperately engaged in architectural innovation because of the lack of cards. This is no different. You know, DeepSeek’s attention and pursuit in Silicon Valley is not something that just happened in the past two weeks. As early as May 2024, when DeepSeek-V2 was released, it caused a small-scale sensation in Silicon Valley with the innovation of the multi-head latent attention mechanism (MLA) architecture. V2's paper has triggered widespread sharing and discussion in the AI ​​research community. At that time, a very interesting phenomenon was that AI practitioners on X and Reddit were discussing DeepSeek-V2. At the same time, DeepSeek was described as the "initiator of large model price wars" in the domestic public opinion field, which felt a bit parallel to time and space. This may show that DeepSeek has a code for dialogue and communication with Silicon Valley, and its secret recipe should be Silicon Valley-flavored. DeepSeek and OpenAI and DeepMind before 2022

If I have to find a benchmark for DeepSeek among the global artificial intelligence players, please allow me to add a prefix Setting conditions: DeepSeek is a bit like OpenAI and DeepMind - OpenAI and DeepMind before 2022. What will OpenAI and DeepMind look like before 2022? Non-profit academic research institution. Although it has been invested by Microsoft and transformed into a for-profit company, the overall working style of OpenAI at that time - at least the group of people represented by chief scientist Ilya Sustkever and co-founder Andrej Kapartthy - was still a non-profit organization, and the company did not The official external product, GPT-3 announced in 2020, is an academic research result and is also open source. Although DeepMind is nominally a startupIt is an industrial company, but whether it existed independently in London or when it was acquired by Google but before it was integrated with Google Brain, it existed more like a research institution. Whether AlphaGo or AlphaFold, they were research projects, not products. Does DeepSeek have its own "product"? It cannot be said that there is no such thing, after all, ordinary users can also chat directly with its models, and it also sells low-cost APIs to developers. But it doesn’t even have a mobile APP, and it doesn’t seem to do anything to operate the product. It doesn’t run traffic ads, doesn’t engage in social media marketing, and doesn’t prepare various thoughtful prompt templates for users. It is enough to have a website that ordinary people can use. On this point alone, DeepSeek is very unlike a Chinese AI company. On the side of enterprises and developers, in addition to the cost reduction based on architectural innovation, which has caused it to severely lower the price of APIs, we can't see any "acceleration plan", "developer competition" or "industrial ecology fund". Wait for projects that many companies are engaged in. This can only mean that it really does not intend to do business now. On the other hand, the density of researchers at DeepSeek is obvious. Qubit's recent review of the authors of DeepSeek-V3 papers has provided us with a very valuable exploration into the composition and characteristics of this company's researchers: fresh doctoral graduates from top universities in China such as Tsinghua University, Peking University, and Beihang University, and top journal papers Publishers and information competition winners are the mainstay of the DeepSeek research team, including master's and doctoral students. The team composition is extremely young. DeepSeek founder Liang Wenfeng revealed in an interview with 36 Krypton's "Undercurrent" that the recruitment criteria are based on ability, not experience. Core technical positions are mainly for fresh graduates and those who have graduated one or two years ago. This is typically a criteria tailored to recruit people for researchers, not product, marketing, and engineering positions. It is also very similar to the early talent structure of OpenAI and DeepMind: use the youngest, smartest, and most unfettered minds to create something that has never been created before. It creates an atmosphere in which the brightest young people enter an institution that looks like a company and then continue their academic careers here, with much more mobility than they could in a purely academic institution, such as a university laboratory. More computing resources and research data. The research institutions of technology companies are the "state within a state" of scientists, and the trend of replacing universities and colleges as the main contributors to academic achievements is becoming increasingly obvious. The less distracted it is from corporate commercial goals, the greater the chance of producing disruptive academic results. Google researchers proposed the Transformer architecture, the basis of generative artificial intelligence, in 2017 when Google’s AI commercialization goals were still unclear. However, there have been few results in the past two years. OpenAI’s GPT-3 and GPT-3.5 were born at two critical moments.Out of the spotlight, and as it started to look more and more like a company, all hell broke loose. This is where DeepSeek differs from most Chinese AI startups and is more like a research institution. The founders of this round of AI startups are basically scientists and researchers, but after receiving rounds of money from VC and PE, they cannot do research and publish papers as they please, but must focus on productization and commercialization (this Probably not what they're best at). Technology giants can afford to support research institutions and scientists, but once research results are required to be quickly applied to products and businesses, the team will become more complex and no longer have the simplicity and clarity of pure researchers. Some technology giants in the United States have research institutions that are not interfered with by commercial goals. However, over time, they have inevitably become tainted with the aura of seniority in academia. The research institutions of commercial companies, all composed of the brightest young people, only appeared at a few critical points in time - OpenAI and DeepMind a few years ago, and DeepSeek now. One piece of evidence is that in addition to the model, DeepSeek’s best “product” is its paper. Regardless of the release of V-2 or V-3, DeepSeek's two corresponding papers have been carefully read, shared, cited and highly recommended by researchers around the world. In contrast, the papers published by OpenAI after the release of GPT-4 can hardly be called papers. These days, model makers are rushing to get rankings on various benchmarks, and not many pay attention to the quality of papers. And a paper that is detailed, standardized and rich in experimental details can still gain extra respect from the industry. Of course, an important premise for this is that DeepSeek has money and has as much ammunition as the giants and far more than startups. But not all giants are willing to have their own DeepMind. Open source is always right

At the beginning of 2023, the technology media The Information conducted a round of inventory of which artificial intelligence star startups may emerge in China. Zhipu and Minimax, which have already made some achievements, are on the list. The newly created Baichuan Intelligence, Zero One Thing and Light Year are also mentioned. The article also specifically mentions the still-unknown startup that was preparing to start a business again. Yang Zhilin. There is no Deepseek here. At least a year and a half ago, no one really considered DeepSeek an AI insider. Although there were rumors in the industry at that time that DeepSeek's parent company, Magic Square, which was engaged in private equity quantitative technology, had a large number of NVIDIA high-performance graphics cards, not many people believed that it would end up making big models. Now, everyone is talking about DeepSeek, and it is following the old path of "flowers outside the wall and fragrance inside the wall". It can be considered that from the first day, DeepSeek and many domestic big model rookies have chosenIt's not the same battlefield. It does not need financing (at least not at the beginning), does not need to compete for the seats of the four little dragons and six little tigers in the large model, and is not as powerful as domestic public opinion (the only interview it accepts is to recruit the most passionate and smart scientists). Do not engage in product launch and distribution. It has chosen the path that best matches the nature of the research institution - to join the global open source community, share the most direct models, research methods and results, attract feedback, then iterate and optimize, and improve itself. The open source community is still the most enthusiastic, full, free and borderless place for AI academic research, sharing and discussion, and it is also the least "involved" place in the AI ​​field. DeepSeek being open source from day one should be well thought out. Open source means truly open source, and it should be open source thoroughly. From model weights, to data sets, to pre-training methods, everything is made public, and high-quality papers are also part of open source. The appearance, sharing and activity of young and smart researchers in the open source community have high visibility. Among the people who saw them were some of the most important promoters in the global AI field. Smart young AI researchers + the atmosphere of a research institution (with packages from major manufacturers) + sharing and communication in the open source community have increased DeepSeek's influence and reputation in the global AI field. For an organization whose main goal is to produce AI research results rather than release commercial products, Hugging Face and Reddit are the best conference venues, data sets and code libraries are the best demos, and papers are the best news. draft. DeepSeek basically does this, and it does it very carefully. So even though DeepSeek’s researchers and CEO rarely give interviews to the media and almost never share technical experiences and insights on forums and events, you can’t say it doesn’t do marketing. On the contrary, DeepSeek’s “marketing” is extremely precise and effective in order to prove that China’s original AI research can lead global trends and recruit the smartest researchers. It is worth mentioning here that China’s major open source large model players have indeed won a lot of respect in global AI research and products in the past year. An increasingly common view is that compared to some open source models in the United States and Europe, China's large open source models are more thorough and easier for researchers and developers to directly use to study or optimize their own models. . DeepSeek is a typical representative. In addition to DeepSeek, Alibaba's general meaning (Qwen) is also generally considered to have a more sincere open source attitude in the AI ​​research field. The small wall-facing intelligent model Mini-CPM-Llama3-V 2.5 was directly adopted by the Stanford undergraduate team. Shell also unexpectedly became popular. So it’s very interesting: the international AI community, especially Silicon Valley, thinks that the representative players of China’s big models are DeepSeek and Alibaba, while we ourselves think that they are Doubao, Keling, and the so-called Six Little Dragons of AI. guestObjectively speaking, DeepSeek and Alibaba have done more in terms of the international AI community, especially Silicon Valley, taking a fair and positive view of China's AI innovation capabilities and contributions to the global community. Open source is always the right thing to do. V-3 is DeepSeek’s GPT-3 moment

The V-3 model triggered an international response that broke the circle. CNBC reports have put V-3 and the technology behind it DeepSeek sees it as a sign that China's AI is catching up with the United States. If you look closely, it is not difficult to find: DeepSeek's transformation from being secretive and low-key to attracting much attention, and its three iterations from Coder to V-3 model, and the upgrade rhythm of OpenAI from GPT-1 to GPT-3 and the repercussions it caused. , is very close. Let’s take a look at OpenAI first - in 2018 OpenAI released the GPT-1 model, which is its first pre-training model based on the Transformer architecture. It proves that the language model is an effective pre-training target, but the quality and diversity are limited, causing It attracted some academic attention, but the overall response was mediocre. Earlier in 2019, OpenAI launched GPT-2, which significantly improved the quality and diversity of generated text. It basically verified the effectiveness of the language model approach and triggered widespread discussion and concern in the AI ​​field. In June 2020, OpenAI released GPT-3, which became the largest language model in the world at the time with 175 billion parameters. In addition to generating text content, it can also perform translation, question and answer, and continuous dialogue and thinking, becoming a milestone in the development of generative artificial intelligence. . Even so, GPT-3 is still a laboratory project. Let's look at DeepSeek again - in November 2023, DeepSeek released two open source models, DeepSeek Coder and DeepSeek LLM. Only a few people paid attention to them, and they also encountered challenges in computing efficiency and scalability. In May 2024, DeepSeek released V-2, which combines the Mixed Expert Model (MoE) and the Multi-head Latent Attention Mechanism (MLA) technology to significantly reduce the cost of model training, especially reasoning, and the performance can be compared with that in many dimensions. Compared with the world's top models, it has begun to trigger extensive discussions and recommendations among AI academics and developers. This is the beginning of DeepSeek entering more people's horizons. In December 2024, DeepSeek released V-3, which achieved model performance surpassing similar open source models Llama 3.1 and Qwen 2.5 and comparable to closed source models GPT-4o and Claude 3.5 Sonnet at one percent of the cost of OpenAI, Anthropic and Google.The results caused a sensation and became a milestone in the development of the world's major language models. It can be said that V-3 is DeepSeek’s “GPT-3” moment, a milestone. Of course, the difference between DeepSeek and OpenAI in the process of achieving the milestone transition is that OpenAI has been committed to achieving unlimited expansion of computing resource scale and cost in this process, while DeepSeek has been committed to using the lowest possible cost of computing resources. Higher efficiency. It took OpenAI two years to reach the GPT-3 moment, and DeepSeek took a year to reach the Holy Grail of V-3. OpenAI has always focused on the advancement of pre-training on the GPT route, while DeepSeek pays equal attention to training and inference - which is also a requirement of the global model technology development trend. If V-3 is indeed DeepSeek’s GPT-3 moment, what happens next? Is it DeepSeek's GPT-3.5 - aka ChatGPT moment, or something else? No one knows, but interesting things may be yet to come. DeepSeek should not always be a "computer-based Pro". It should also make greater contributions to the artificial intelligence cause of all mankind. In any case, DeepSeek is already one of the most global AI companies in China, and its secret recipe for winning respect from peers and even rivals around the world is also Silicon Valley-flavored.
Keywords: Bitcoin
Share to: