Highlights
①OpenAI continues to be the most commonly used language among the LangSmith user base The top model supplier, its usage rate is more than six times that of second-ranked Ollama. ② The adoption rate of open source models has grown significantly, especially Ollama and Groq, two companies that support users to run open source models and successfully ranked among the top five in the industry this year. ③Agents have attracted attention, and developers are more inclined to use multi-step agents to increase the complexity of applications when building large language model applications.
On December 20, the American artificial intelligence company LangChain recently released the "State of AI Report 2024". Since 2018, the LangChain team has released the "Artificial Intelligence Panorama Report" for seven consecutive years, becoming a popular weathervane in the artificial intelligence industry. In this year’s report, by deeply exploring the usage patterns of LangSmith’s products, a large model application development platform, the LangChain team reveals how the artificial intelligence ecosystem and the way people build large language model applications are evolving. The LangChain team noted in the report that as users tracked, evaluated, and iterated in LangSmith, they observed several notable changes, including a sharp increase in the adoption of open source models and a shift from a retrieval-based workflow to one with multiple Agent applications for step,agency workflows.
The LangChain team dug into the following statistics to sort out what developers are building, testing and prioritizing.
Analysis of the use of large language modelsLarge language models are rapidly gaining popularity around the world, which has also triggered a common question, similar to Based on the question posed by the queen to the magic mirror in the fairy tale: "Of all the models, which one is the most commonly used?" The LangChain team revealed the answer to this question through in-depth analysis of the data collected. (1) Top suppliers of large language models:
Note: Ranking of the top ten language model suppliers in 2024. As shown in the previous year's data, OpenAI continues to be the most commonly used among the LangSmith user base. The largest language model supplier, its usage rate is more than six times that of the second-ranked Ollama.
Note: What is particularly noteworthy about the ranking of the top ten language model vendors in 2024 is that Ollama and Groq (both companies support users to run open source models, Ollama focuses on local execution, while Groq focuses on (deployed in the cloud) has experienced rapid growth this year and successfully ranked among the top five in the industry. This trend reflects the market’s appetite for more flexible deployment options and personalized AIGrowing demand for infrastructure. In terms of open source model vendors, the ranking of the top vendors is relatively stable compared to last year - companies such as Ollama, Mistral, and Hugging Face provide developers with convenient platforms so that they can easily run open source models. The combined usage of these open source software vendors accounts for 20% of the top 20 large language model vendors. (2) Top vector retrieval/storage systems:
Note: Ranking of the top ten vector retrieval/storage systems in 2024
Among many generative artificial intelligence (GenAI) workflows , performing efficient retrieval operations still plays a key role. This year's ranking of the top vector storage systems remains stable compared to last year, with Chroma and FAISS continuing to occupy the top two most popular positions. In addition, Milvus, MongoDB and Elastic's vector databases also successfully ranked in the top ten this year, reflecting the industry's growing interest in flexible deployment options and customizable artificial intelligence infrastructure.
Build applications using LangChain productsNote: How organizations use LangSmith Build apps
As developers become more experienced with generative AI, they are building more dynamic apps. From the increasing complexity of workflows to the rise of artificial intelligence agents (AI agents) - LangChain has observed several trends that point to an ecosystem that is constantly innovating and evolving.
(1) Observability is not limited to LangChain applications
Although the open source framework LangChain is the first choice for many developers to build large language model applications, according to LangSmith’s tracking data this year, there are 15.7% of traces come from non-LangChain frameworks. This phenomenon reveals a broader trend: no matter which framework is used to build large language model applications, the need for observability is universal. LangSmith addresses this need by supporting interoperability between different frameworks.
(2) Python continues to dominate, and JavaScript usage is steadily increasing
In the field of debugging, testing and monitoring, Python SDK is favored by Python developers, accounting for 84.7% usage rate. At the same time, as developers increasingly devote themselves to Web-first application development, interest in using JavaScript has also increased significantly. This year, the proportion of JavaScript SDK used in LangSmith reached 15.3%, a three-fold increase compared to last year.
(3) Intelligent agents are gradually receiving attention
As enterprises increasingly focus on integrating agents across various industries, adoption of LangGraph, our controllable agent framework, is also on the rise. Since its launch in March 2024, LangGraph’s popularity has grown steadily – 43% of organizations using the LangSmith platform are now sending LangGraph tracking data. These trace data represent complex, coordinated tasks that go beyond basic large language model interactions.
This increase is consistent with the increase in agent behavior. The LangChain team found that an average of 21.9% of traces now involve tool calls, compared with an average of just 0.5% in 2023. Tool calls allow the model to autonomously call functions or external resources, signaling more agent behavior, i.e. the model decides when to take action. Increased use of tool calls can enhance the agent's ability to interact with external systems and perform tasks such as writing to a database.
Performance and OptimizationAchieving speed and complexity in application development, especially in applications that utilize large language model resources Balance is a core challenge. The LangChain team analyzes how organizations interact with their applications, ensuring the complexity of their needs matches performance efficiency. (1) The increase in complexity has not affected the efficiency of task processing
Note: The LangChain team has observed a significant increase in the average number of steps per trace. In the past year, the LangChain team has observed The average number of steps per trace has grown significantly, rising from 2.8 steps in 2023 to 7.7 steps in 2024. The LangChain team defines these steps as independent operations in tracing, including calls to large language models, retrievers, or tools. This growing trend reveals that organizations are adopting more complex and multidimensional workflows. The systems built by users have gone beyond simple question-and-answer interactions to linking multiple tasks together, such as information retrieval, information processing, and producing executable results.
At the same time, the average number of large language model calls per trace increased more modestly - from 1.1 to 1.4. This shows that when designing the system, developers are trying to implement more functions while reducing the number of large language model calls, which not only maintains the functionality of the system, but also effectively controls the high-cost large language model requests.
Large language model testing and evaluationNote: Top-level evaluation data Ranking Faced with the challenge of how to ensure that large language model applications do not produce inaccurate or low-quality responses, what steps are organizations taking? While maintaining high standards of quality for large language model applications is a difficult task, the survey found that organizations are leveraging LangSmith’s assessment tools to automate the testing process and buildEstablish user feedback mechanisms to develop more robust and reliable applications. Through LangSmith's evaluation capabilities, organizations can automate testing and collect user feedback to ensure the quality of output from large language model applications. This includes not only testing the accuracy and quality of responses generated by large language models, but also continuously adjusting and optimizing application performance based on user feedback. This approach enables organizations to ensure that the performance of large language model applications remains efficient while responding to complex requirements. (1) Large language model as reviewer: Key elements are evaluated using a large language model as an evaluation tool for reviewers. The scoring criteria are integrated into the prompts of the large language model, and the large language model is used to evaluate whether the output results meet specific evaluation criteria. . The LangChain team observed that developers focused most on the following features during testing: relevance, correctness, exact matching, and usefulness. These characteristics highlight that most developers are conducting preliminary response quality checks to ensure that AI-generated content does not deviate significantly from the intended goal. (2) Use human feedback for iteration In the process of building large language model applications, human feedback plays a crucial role. By accelerating the collection and integration of human feedback into the tracking and execution process (i.e. execution span), LangSmith helps users build richer data sets to improve and optimize applications. Over the past year, the number of annotation executions has increased 18-fold, an increase that is directly proportional to the increase in LangSmith usage.
Although the number of feedback items per execution increased from 2.28 to 2.59, showing a slight increase, the amount of feedback is still small relative to each execution. This may mean that users are more inclined to pursue speed when reviewing executions rather than providing detailed feedback, or they may provide comments only on those executions that are most critical or problematic.
ConclusionIn 2024, developers are building big languages When applying models, they are more likely to use multi-step agents to increase the complexity of the application; they improve efficiency by reducing the number of calls to large language models, and introduce quality inspection mechanisms to ensure the quality of output results through feedback and evaluation methods. As the application of large language models continues to increase, we look forward to seeing how developers further explore smarter workflows, improve performance, and enhance application reliability.