Another Chinese model has triggered reflection in the Silicon Valley big model industry! This time, it is Baidu Wenxin X1.
Bill Gurley, a partner of Benchmark, a well-known venture capital firm, commented on social media: "American artificial intelligence companies should spend 100% of their time on development and innovation, rather than lobbying for protection in Washington, D.C. to avoid competition."
On the other hand, overseas users are asking for help on the platform for help in how to get a Baidu account. Tech KOL Alvin Foo commented after hours of use: "Baidu made a major update to Wenxin...its performance is impressive. Its performance is better than ChatGPT 4.5 in multiple benchmarks, and the price is only 1%." Tech writer Robert Scoble said bluntly: "We have an AI price war!"
All of this originated from the release of Wenxin Mockup 4.5 and Wenxin Mockup X1 on March 16. At present, the two flagship products have been opened to users for free on Wenxin Yiyan's official website in advance.
As Baidu's new generation dock model, the Wenxin big model 4.5 scored better than GPT4.5, DeepSeek-V3, etc., and the average score was 79.6 points higher than GPT4.5's 79.14. As Baidu's first in-depth thinking model, Wenxin X1 focuses on the ultimate cost-effectiveness. The price of inputting 0.002 yuan/k tokens and outputting 0.008 yuan/k tokens is only half the price of DeepSeek-R1, but the performance is comparable.
More importantly, Wenxin X1 has achieved a breakthrough from logical reasoning, hot spot analysis to collaborative call of multiple tools, becoming the first deep thinking model to independently use tools. It can call 11 tools such as search, AI drawing, and code execution. The model can independently plan action paths and generate solutions that can be directly implemented.
Suppose that when the deep thinking model can call enough tools to implement instructions issued by users, such as directly calling programming tools for programming, or calling word document archive knowledge base to form a series of workflows, isn't this a native agent?
We tested X1 and Wenxin 4.5 in full with this question.
1. Can X1, which can call the tool, be comparable to an Agent?In the evaluation process, we first tested the basic programming capabilities of X1 and asked it to help design a snake-eating game. Through thinking, X1 quickly gave design ideas and code.
The code structure is clear and the annotations are detailed. It not only contains the core logic of the game, but also takes into account details such as collision detection, score calculation and game state management. But X1 bodyI have a complete "programmer straight man style" and I don't know how to beautify the little snake at all. (Laughing to death)
Afterward, we tested X1's logical reasoning ability and proposed a classic reasoning puzzle to it. In our question bank, we asked almost every reasoning model, including DeepSeek R1, Kimi1.5, and OpenAI o1, but without exception, they all chose the answer to "abba". However, the real situation is that the first three questions are correct, and the motive for the last question is c, and the self-protection of a patient with persecution paranoid disease.
It seems that big models understand logic, but it is still difficult to understand human nature.
In the function of X1 this time, the biggest difference from the previous inference model is that it can think about it and call the tools to fulfill the user's more practical needs. Combined with the ability of multimodality, we have tried several very practical scenarios.
For example, when we found a picture of a room, we need to softly renovate the room in the picture and generate a rendering. This involves picture understanding - decoration opinions - AI raw pictures, and three tools are called to complete the final rendering display.
When the rendering came out, I was shocked! Home decoration designers are in danger! Not only can you customize the style, but you can further adjust the furniture that is not satisfied with. X1 can also handle more complex needs, such as room decoration Feng Shui, etc., unlimited pictures, and you can paint until you are satisfied!
After actual testing, X1 responds very quickly even under complex tasks and has no lag. At the same time, X1 can actively identify task requirements and accurately analyze the nature and complexity of the problem. It can understand the true intention without user clear guidance.
The most prominent feature is its ability to select independent tools. The model can intelligently select the optimal tool combination based on task characteristics, rather than simply applying a fixed process. In actual use, X1 can flexibly call search, drawing, code execution and other tools to work together in a single interaction, breaking the tool usage boundaries of traditional models.
In general, inference analysis can reach the level of R1, and can call other tools without lag, which is so delicious!
2. How to have high performance and low cost?So, how does X1 achieve the price reduction while ensuring model performance?
This breakthrough is closely related to Baidu's years of technological accumulation. Through the joint optimization of PaddlePaddle deep learning platform and Wenxin big model, Wenxin X1 has achieved extreme full-link tuning, greatly reducing the inference cost.
At the level of model compression, Wenxin X1 uses cutting-edge technology for deep optimization. Through blocked Hadamard quantization technology, the balance between model parameter accuracy and scale is accurately controlled; for long-sequence scenarios, the team specially optimized the quantitative solution of the attention mechanism, which significantly reduced the computing resource requirements while maintaining inference accuracy. TheseCompression technology reduces the model volume while maintaining high-level performance.
The reasoning engine is another key breakthrough point. The Baidu team has achieved low-precision and high-performance operator optimization, making full use of hardware characteristics; innovatively developed dynamic adaptive decoding technology, and in-depth customized optimization based on neural network compiler, achieving inference acceleration.
Finally, system-level optimization is achieved through framework and chip collaborative optimization, separate deployment architecture, and efficient resource scheduling.
Of course, in addition to Baidu's technological accumulation in the AI era, there are also engineering technological innovations.
According to Silicon Stars, X1 adopts a progressive reinforcement learning training method. Unlike traditional reinforcement learning, training models through the "trial and error + reward" mechanism, "progressive" emphasizes phased and gradual training strategies to improve the training efficiency of the model.
This method is similar to the human learning path - "learning to walk first, then learning to run", with the purpose of improving the comprehensive application capabilities of the model in more complex task fields such as creation, search, tool calling, and reasoning. In actual tests, when X1 is required to analyze a financial report picture containing a chart and generate investment advice, the model can make independent decisions first using image understanding tools, then call search to obtain relevant industry data, and finally generate data visual analysis through a code interpreter. The whole process is like the work process of a professional analyst.
Secondly, X1 breaks through the combination of thinking chain and action chain for end-to-end training.
Simply understand it, the model does not learn thinking and action in a split manner, but organically integrates the two to form a complete decision-making-execution closed loop. In this way, X1 can dynamically adjust thinking and action strategies based on the results of each action. For example, in complex market analysis tasks, X1 first analyzes what data is needed through the thinking chain, and then calls the search tool to obtain the latest market data through the action chain. After discovering that the data is insufficient, it actively adjusts the strategy to use more professional data analysis tools to finally generate a comprehensive analysis report. This flexibility is difficult to achieve in traditional models.
Thirdly, X1 innovatively solved the limitations of a single reward indicator and built a diverse and unified reward system. By integrating multiple types of reward mechanisms, a comprehensive reward signal is formed to guide the direction of model optimization in all aspects.
The biggest advantage of this system is to avoid the model being "substantial" in terms of modeling, such as excessive pursuit of accuracy, resulting in boring content, or excessive pursuit of creativity and sacrificing accuracy. The output of X1 is closer to human comprehensive judgment and can adaptively adjust the expression style in different scenarios. Of course, this also brings challenges: it is necessary to dynamically adjust the weight (such as creative scenes focus on creativity, code generation scenes focus on logic), and rely on massive scene data training.
3. The model is given away, and Wenxin 4.5 is here tooIt is worth mentioning that in addition to the powerful performance of Wenxin X1, Baidu has also released the basic model that was previously announced: Wenxin Mockup 4.5.
In actual testing, Wenxin 4.5 demonstrates excellent multimodal understanding ability and extremely low hallucination rate. For example, we sent a Douyin video to the model. Regarding the introduction of electronic products, Wenxin 4.5 can not only accurately identify the professional terms and key data in the video, but also purchase and recommend the product. When faced with mixed inputs containing multiple sources of information (pictures, tables, text), the model can correctly identify and distinguish information from different sources, avoiding common information confusion and fiction, thanks to its powerful anti-illusion ability.
Improve the de-illusionment ability and accuracy of Baidu series models through iRAG technology, as well as FlashMask dynamic attention masking technology, multimodal heterogeneous expert expansion technology, spatiotemporal dimensional characterization compression technology, large-scale data construction technology based on knowledge points and Post-training technology based on self-feedback. These technologies not only ensure the accurate understanding and stable output of the model, but also provide a solid foundation for the subsequent industry application of the model.
The big model can be promoted from an entertainment environment to a business environment, such as helping home decoration designers to design styles, analyzing video scripts for imitation, e-commerce product diagram generation, etc., all of which have become the ability of a general big model.
Through these in-depth technological innovations, Baidu not only created a high-performance and low-cost Wenxin X1, but more importantly, it has explored a unique development path for large-scale models, which takes into account both practicality and economy while pursuing the ultimate in technology.
This concept of balanced development not only meets the high requirements for AI performance by enterprises, but also solves the application cost problem, allowing AI to truly create practical value for all walks of life.