Today, the first version of our new series of models DeepSeek-V3 is online and open source.
Log in to the official website chat.deepseek.com to chat with the latest version of the V3 model. The API service has been updated synchronously, and the interface configuration does not need to be changed. The current version of DeepSeek-V3 does not currently support multi-modal input and output.
Performance aligned with overseas leading closed source modelDeepSeek-V3 is Self-developed MoE model, 671B parameters, 37B activations, pre-trained on 14.8T tokens.
Paper link:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
DeepSeek-V3 Many evaluation results have surpassed other open source models such as Qwen2.5-72B and Llama-3.1-405B, and are on par with the world's top closed-source models GPT-4o and GPT-4o in performance. Claude-3.5-Sonnet is neck and neck.
Encyclopedia knowledge: The level of DeepSeek-V3 in knowledge tasks (MMLU, MMLU-Pro, GPQA, SimpleQA) has been significantly improved compared to the previous generation DeepSeek-V2.5, and is close to the current best-performing model Claude-3.5 -Sonnet-1022. Long text: In terms of long text evaluation, DeepSeek-V3 outperforms other models on average on DROP, FRAMES and LongBench v2. Code: DeepSeek-V3 is far ahead of all non-o1 models on the market in algorithm code scenarios (Codeforces), and is close to Claude-3.5-Sonnet-1022 in engineering code scenarios (SWE-Bench Verified). Mathematics: DeepSeek-V3 significantly surpassed all open source and closed source models at the American Mathematics Competition (AIME 2024, MATH) and the National High School Mathematics League (CNMO 2024). Chinese ability: DeepSeek-V3 and Qwen2.5-72B perform similarly on evaluation sets such as educational evaluation C-Eval and pronoun disambiguation, but are ahead in factual knowledge C-SimpleQA. Generation speed increased to 3 timesp>Through algorithm and engineering innovation, the speech generation speed of DeepSeek-V3 has been greatly increased from 20 TPS to 60 TPS. Compared with the V2.5 model, it has achieved a 3 times improvement, bringing users a faster and smoother experience. Use experience.
API service price adjustmentWith stronger performance and speed The faster DeepSeek-V3 update is online, and our model API service pricing will also be adjusted to 0.5 yuan (cache hit) / 2 yuan (cache miss) per million input tokens, and 8 yuan per million output tokens Yuan, in order to continue to provide better model services to everyone.
At the same time, we have decided to set up a 45-day preferential price trial period for the new model: from now until February 8, 2025, the API service price of DeepSeek-V3 will still be familiar to everyone. 0.1 yuan (cache hit)/1 yuan (cache miss) per million input tokens, 2 output tokens per million Yuan, existing registered users and new users registered during this period can enjoy the above preferential prices.
Open source weights and local deploymentDeepSeek-V3 uses FP8 training , and open sourced the native FP8 weights.
Thanks to the support of the open source community, SGLang and LMDeploy immediately supported the native FP8 inference of the V3 model, while TensorRT-LLM and MindIE implemented BF16 inference. In addition, to facilitate community adaptation and expansion of application scenarios, we provide a conversion script from FP8 to BF16.
For model weight downloads and more local deployment information, please refer to:
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
"Pursuing inclusive AGI with an open source spirit and long-termism" has always been DeepSeek's firm belief. We are very excited to share the staged progress in model pre-training with the community, and we are also very pleased to see that the capability gap between open source models and closed source models is further narrowing.
This is a new beginning. In the future, we will continue to build richer functions such as deep thinking and multi-modality on the DeepSeek-V3 base model, and will continue to share our latest exploration results with the community. .