DeepSeek R1 is having a huge impact on the entire technology field, subverting people's perceptions about AI. On the mobile side, innovation is happening rapidly.
On February 20, Qualcomm released the latest AI white paper "AI Change Is Promoting Terminal Inference Innovation", introducing the prospects of high-quality small language models and multimodal inference models on the end.
As the process of AI being gradually implemented on a large scale, we have gradually realized that end-side large-scale model reasoning can bring people higher reliability and improve data security. With the rapid development of technology, more advantages are emerging.
Qualcomm pointed out that four major trends are driving the end-side AI transformation:
The current advanced AI small models already have excellent performance. New technologies such as model distillation and new AI network architecture can simplify the development process without affecting quality, allowing the performance of new models to be quickly improved and close to the cloud-based large models; the scale of model parameters is rapidly shrinking. Advanced quantization and pruning technologies enable developers to reduce the scale of model parameters without substantial impact on accuracy; developers can create richer applications on the edge. The rapid surge in high-quality AI models means that features such as text summary, programming assistants and real-time translation are popular on terminals such as smartphones, allowing AI to support commercial applications deployed at scale across edges; AI is becoming the new UI. Personalized multimodal AI agents will simplify interaction and efficiently complete tasks across a variety of applications.While cutting-edge big model technology continues to make breakthroughs, the technology industry has also begun to devote its energy to efficient deployment on the edge. Driven by lower training costs, rapid inference deployment, and innovations for edge environments, the industry has spawned a large number of smarter, smaller, and more efficient models.
These technological advancements are gradually spreading to chip manufacturers, developers and consumers, forming new trends.
The model has become a necessity for development.Looking at the development of large language models in recent years, we can clearly see some significant trends, including from the scale of parameter segmentation to application segmentation, from single mode to multimodal, the rise of lightweight models, and the inclination to the terminal side deployment, etc.
Especially, the recent launch of DeepSeek V3 and R1 reflects these development trends in the AI industry. The resulting decline in training costs, rapid inference deployment and innovations for edge environments are driving the surge in high-quality small models. Looking into the reasons, the current transformation to small models is the result of several comprehensive effects.
First, the model network architecture is constantly innovating, from the mainstream Transformer at the beginning to the later hybrid expert model(MoE) and state space model (SSM) coexist, and the computing overhead and power consumption during the development of large models continue to decrease. Therefore, more and more models are beginning to adopt new architectures.
The second is the use of knowledge distillation technology, which has become the key to developing efficient "basic and task-specific" small models. By migrating the knowledge of complex teacher models to smaller student models, on the one hand, the number of parameters and calculations of the model is significantly reduced, the training process is simplified, and the storage space is occupied, which is suitable for deployment on resource-constrained devices; on the other hand, the student models can also obtain rich knowledge and ensure model accuracy and generalization capabilities.
Comparison of the average results of the LiveBench AI benchmark test of the 70 billion parameter model of Meta Llama and the corresponding distillation model of DeepSeek. Source: LiveBench.ai
The third is the continuous improvement of large-scale model optimization and deployment technologies such as quantization, compression and pruning, which further promotes the smaller model scale. These technologies can also significantly reduce the computing and storage requirements of the model while maintaining high performance.
With the above innovation and progress in the underlying architecture and technology, the capabilities of small models are approaching and even surpassing cutting-edge large models with much larger sizes. For example, in the GPQA benchmark, the DeepSeek distilled version based on the Tongyi Qianwen model and the Llama model achieved similar or higher performances to GPT-4o, Claude 3.5 Sonnet, and GPT-o1 mini.
Source: DeepSeek, January 2025.
From the industry perspective, technological advancements have driven a surge in high-quality generative AI models. According to Epoch AI statistics, among the AI models released in 2024, more than 75% of the models below 100 billion scale have become the mainstream.
Source: Epoch AI, January 2025.
Therefore, driven by cost, computing power requirements, performance trade-offs and other aspects, small models are replacing large models and becoming the first choice for many companies and developers. Currently, mainstream models including DeepSeek R1, Meta Llama, etc. have launched small model versions and have performed well in mainstream benchmarks and domain-specific task testing.
In particular, the faster inference speed, less memory footprint and lower power consumption displayed by small models make this type of model the first choice for terminal deployments such as mobile phones and PCs.
In the field of AI, the terminal-side model parameters are usually between 1 billion and 10 billion, while the scale of some new model parameters recently released has dropped to less than 2 billion. With the continuous decline in the scale of model parameters and the improvement in the quality of small models, parameters are no longer an important indicator for measuring model quality.
Relatively, nowThe running memory configuration of flagship smartphones is above 12GB, which is theoretically enough to support the operation of many models. At the same time, small models for mainstream mobile phone configurations are constantly emerging.
As high-quality small models accelerate the pace of large-scale deployment on terminal sides such as mobile phones and PCs, it has further promoted the wide implementation of AI inference functions and multi-modal generation AI applications (such as document summary, AI image generation, real-time language translation, etc.) on the terminal side, providing important support for the popularization of AI technology to ordinary users on the wider end side.
In the process of promoting the implementation of end-side AI, Qualcomm has been paving the way for the industry. In the era of AI reasoning, Qualcomm will lead the industry change
Qualcomm is leading and benefiting from the technology expertise of high-efficiency chip design, advanced deployment of AI software stacks and comprehensive development support for edge applications.
Durga Malladi, senior vice president and general manager of technical planning and edge solutions business at Qualcomm Technologies, said that today's small model performance has surpassed the cloud-based model launched a year ago. "Our focus is no longer on the model itself, but on the development of applications on the terminal. As more and more high-quality AI models can be run on the terminal side, AI applications are beginning to emerge. AI is redefining the user interface of all terminals, which also means that AI is becoming a new UI on the terminal side."
Qualcomm believes that in the new era of AI definition, multiple sensor data including voice, text, and images will be processed first through an AI agent - rather than directly applied to an app. After the agent acquires information, he will assign tasks to different background applications, which is inconvenient for the user.
In conventional mobile phone systems, the number of terminal-side models available to developers is increasing, and AI agents need to select the required model from the large number of AI models that can be obtained on the terminal side to complete the task. This process will significantly reduce the complexity of interaction, achieve highly personalized multimodal capabilities, and can complete tasks across a variety of applications.
For end users, AI agents are the only UI that interacts with them on the front end, and all practical applications are processed in the background.
Utilizing the capabilities of high-quality small models, terminals like smartphones can achieve interactive innovation. Qualcomm has certain strategic advantages in the transformation of AI from training to large-scale inference and the expansion from the cloud to the end:
High-performance, high-energy-efficient chip design: Qualcomm provides industry-leading system-level chips that integrate custom CPUs, NPUs, GPUs and low-power subsystems, which can provide high-performance, high-efficiency AI inference on the terminal side.Working with complex AI tasks while maintaining battery life and overall energy efficiency performance; scalability covering all key edge segments: Qualcomm’s scalable hardware and software solutions have empowered billions of smartphones, cars, XR headsets and glasses, PCs, and industrial IoT, providing the foundation for a wide range of transformative AI experiences; an active ecosystem: Through the collaboration of Qualcomm AI software stack, Qualcomm AI Hub and strategic developers, Qualcomm provides tools, frameworks and SDKs for model deployment across different edge terminal areas, empowering developers to accelerate the adoption of AI agents and applications at the edge.Qualcomm not only predicted the explosion of terminal-side models, but also promoted the implementation of edge AI inference on cross-terminal devices.
Cristiano Amon, president and CEO of Qualcomm, shared his views on the current trends in the AI industry in a recent quarter earnings call: "Recent DeepSeek R1 and other similar models show that AI models are growing faster and faster, becoming smaller, stronger, more efficient, and can run directly on the terminal side. In fact, the distilled model of DeepSeek R1 can run on smartphones and PCs equipped with Snapdragon platform within just a few days of its release."
As we enter the era of AI inference, model training will still be conducted in the cloud, but inference will increasingly run on the terminal side, making AI more convenient, customizable and efficient. This will promote the development and adoption of more targeted dedicated models and applications, and therefore promote the demand for computing platforms of various terminals.
The popularity of DeepSeek R1 appropriately verifies Qualcomm's previous judgment on terminal-side AI. With its advanced connectivity, computing and edge AI technologies and a unique product portfolio, Qualcomm not only maintains a high degree of differentiation in the end-side AI field, but also provides strong support for its vision of hybrid AI.
In the future, end-side AI will play an increasingly important role in various industries.