Cursor is popular, but Cursor is not the way out for AI programming

Image source: Generated by Unbounded AI

In 2021, Microsoft launched GitHub Copilot, which has become the most popular AI tool in the programming world.

GitHub Copilot can automatically generate complete code functions based on contextual information provided by users, such as function names, comments, code snippets, etc., and is known as a "game changer" in the programming world.

The reason it is so amazing is that the underlying layer is connected to OpenAI’s Codex model. Codex parameter size is 12 billion. It is an early version of GPT-3 and has been specifically optimized for encoding tasks. This is the first time that a large-parameter model based on the Transformer architecture has truly "emerged" in the code field.

GitHub Copilot ignited the enthusiasm of developers around the world for AI programming. Four MIT undergraduates also got together and founded a company called Anysphere in 2022 with the dream of changing software development.

Anysphere once "openly challenged" Microsoft, saying that Microsoft is its main competitor. Anysphere co-founder Michael Truell made it clear that while Microsoft's Visual Studio Code dominates the integrated development environment (IDE) market, Anysphere sees an opportunity to offer a different product.

Michael Truell (first from right)

Microsoft may not have imagined that in less than three years, this little-known team would drop a blockbuster "bomb" on the industry , triggered a new round of AI programming craze around the world, and the company also became a unicorn with a valuation of US$2.5 billion within four months.

Why Cursor became so popular

In August 2024, Tesla’s former AI director Andrej Karpathy posted several tweets on X. Praising a code editor called "Cursor", saying it has surpassed GitHub Copilot in a crushing way.

In the same month, Anysphere, the company behind Cursor, completed a US$60 million Series A financing with a valuation of US$400 million.

The amazing thing about Cursor is that it has functions such as multi-line editing, cross-file context completion, asking questions, and next action prediction. Developers only need to keep pressing the Tab key to automatically complete the code modification of the entire file, and Cursor's processing results are more accurate and faster, with almost no delay.

Everyone who knows programming knows how deep the knowledge is.

“Completing and predicting multiple places across files is a very subtle requirement. It may be difficult for developers themselves to express it accurately, but after actually using it, they will feel very ‘cool’.” GruFounder and CEO Zhang Hailong said.

Tom Yedwab, who has decades of development experience, also wrote an article to share that the tab completion function is the function that best suits his daily coding habits and saves the most time. "This tool is like reading my mind and can predict my next operations, allowing me to focus less on code details and more on building the overall architecture." Tom Yedwab wrote.

The key to Cursor’s success does not lie in the high technical barriers, but in the fact that they were the first to discover a subtle new need and dared to bet on a road that no one had ever traveled before.

Cursor is parasitic on VS Code, which is Visual Studio Code, a free and open source cross-platform code editor developed by Microsoft. It has some basic code completion functions.

Previously, developers would create various plug-ins to expand the functional boundaries of VS Code, but VS Code's own plug-in mechanism had many limitations. For example, when dealing with large projects, some plug-ins may cause code indexing and analysis to slow down; for some complex plug-ins, the configuration process is cumbersome and requires users to manually modify the configuration files, which virtually increases the threshold for use.

Therefore, in order to eliminate these limitations, the Cursor team adopted a very bold approach. Instead of making plug-ins on VS Code in the traditional way, they "magically modified" the VS Code code to make it compatible at the bottom. Multiple AI models, and through a large number of engineering optimizations, improve the user experience of the entire IDE.

Zhang Hailong said that in the early days of Cursor’s development, many practitioners, including him, were not optimistic about it because this road was difficult and it was a huge “non-consensus.” The internal structure of VS Code is complex, involving multiple modules such as code editing, syntax analysis, code indexing, and plug-in systems, and there may be differences in different versions of VS Code. Compatibility must be considered during the "magic modification" process. In addition, when multiple AI models are built into VS Code, it is necessary to solve the interaction problems between the model and the editor. For example, how to effectively transfer the code context to the model? How to process the model's output and apply it to code? And how to minimize code generation latency?

To solve a series of problems, a complicated engineering optimization system is involved. In 2023 alone, Cursor has undergone three major version updates and nearly 40 function iterations. This is a huge test for the patience of the entire R&D team and the investors behind the company.

Finally, Silicon Valley once again proved to the world its ability to nurture disruptive innovation. The success of Cursor is a very classic Silicon Valley entrepreneurial template: a group of paranoid technology geeks with a grand vision, bravely ventured into no man's land with the support of Silicon Valley's mature VC system, and were the first to eat crabs despite countless doubts. In the end, they relied on The product was an instant hit.

“This is the charm of entrepreneurship. They also ran away from such an ‘unreliable’ project.Coming. "Zhang Hailong sighed.

Recently, Anysphere announced the completion of a US$100 million Series B financing, with a valuation reaching US$2.6 billion. According to Sacra estimates, in November 2024, Cursor's annual recurring revenue (ARR) will reach 65 million US dollars, a year-on-year increase of 6400%. Since its establishment in 2022, Anysphere has only 12 people.

Copilot is clear, Agent is confused

Cursor is not the first to make an exit on the AI programming track product.

In March 2024, Devin, billed as "the world's first AI programmer", was born, igniting the industry's enthusiasm for AI programming for the first time.

Devin is. An autonomous agent (Autonomous Agent), masters full-stack skills, can learn independently, build and deploy applications end-to-end, correct bugs by itself, and even train and fine-tune its own AI model. The company behind it is Cognition. AI is also a sparkling AI "dream team".

However, Devin was initially released as a demo, and developers could not get started with it until December 11, 2024. Monthly subscription fees up to $500 . In comparison, Cursor’s monthly subscription fee of 20 US dollars seems more affordable.

Compared with Cursor’s universal love, some people think that Devin is controversial. Code migration and generating PR (Pull Request, a code change request submitted by developers during code collaboration so that other team members can conduct code review and merge) has excellent performance and can greatly reduce developers’ repetitive work; however, some users pointed out that Devin is not good at handling complex business Logic still requires a lot of manual intervention, especially when the project documentation is insufficient or the code quality is poor.

< p>Zhang Hailong said that the fundamental reason for the difference in "reviews" between Cursor and Devin is the difference in failure rates and failure costs after developers use the products.

Currently, the failure rate in the Copilot scenario is relatively low. , the accuracy of the corresponding evaluation HumanEval has approached 100%, and the evaluation corresponding to the Agent scenario is SWE The current benchmark accuracy is less than 60%.

In addition, AI work results require human acceptance and confirmation. The interaction method of Copilot products determines that the cost for developers to view AI-generated results is very low. After failure, The cost for users to modify or not adopt is also very low. However, for Agent products, the user’s confirmation cost is significantly higher than Copilot, and after failure, the cost of modification is also higher.

Curso.The two trends of r and Devin also largely reflect the current status of the two product forms of Copilot and Agent in general scenarios.

Cursor is the representative of Copilot, which requires AI and humans to work simultaneously, with humans leading and AI assisting.

At present, Copilot is the one that can really get through PMF. Copilot can be parasitic in IDEs such as VS Code and exist in the form of plug-ins to assist human developers in completing various coding actions. After the emergence of GitHub Copilot, users have gradually become accustomed to Copilot's collaboration form. The emergence of GPT-3.5 has truly transformed Copilot from a demo into a usable product.

However, Zhang Hailong once wrote an article mentioning the "hidden worries" of Copilot products. "The real moat is VS Code. VS Code has transformed from a simple editor into a platform. The reason why users can easily migrate from GitHub Copilot to Cursor is because they are both parasitic on VS Code, user usage habits, experience, and functions/plug-ins are all the same. Cursor also proves that there is no "data flywheel" in Copilot products. The data you can get is available to large models and is already part of the model. . ”

In contrast, Agent is a new species spawned by GPT-3.5, a new concept that is more stimulating to the sensitive nerves of entrepreneurs and VCs. Devin is a representative of the Agent form, which requires AI and humans to work asynchronously. AI has greater initiative and can independently complete some decisions and executions.

Zhang Hailong believes that Agent is the opportunity for entrepreneurs. But he is not optimistic about the all-round Agent vision advocated by Devin. "Doing everything means nothing can be done. Agent applications in subdivided fields have higher value."

However, because the concept of Agent is too early, Everyone is exploring, and the agent's parasitic environment and capability boundaries are still unclear. There are people involved in code generation, code completion, unit test generation, defect detection and other directions.

Gru chooses to start from the unit test (Unit-test) link. Before the official launch of the product, Gru also went through a period of trial and error internally, and attempts were made to automatically generate files, fix bugs, and E2E testing. However, they were unable to advance due to pain points such as model capabilities and later iteration and maintenance of the software.

In the end, Gru discovered the common but inconspicuous need for unit testing. Zhang Hailong said that many developers don't like writing unit tests because it is boring. In addition, for projects with low requirements, unit testing is not a necessary requirement for software engineering. However, Gru believes that from the perspective of technical capabilities, the implementation of AI products must solve the problem of coherence between business context and engineering context. Unit testing is the least dependent on the two contexts and is the most suitable.link to current model capabilities.

However, both Copilot and Agent are means rather than ends. The two are not an "either-or" relationship, but will coexist to solve different problems.

For many individual developers and some small and medium-sized enterprises, general products such as Cursor or some open source models may be enough to solve most needs; but for many large enterprises and complex business scenarios in different fields, It is difficult to simply meet the needs through a general product in the form of a "Copilot" or "Agent", which requires technology manufacturers to have stronger domain-specific service capabilities.

The latter is where the opportunities lie for domestic AI programming companies.

Domestic opportunities are in the vertical field

Looking back to 2024, AI programming is undoubtedly one of the hottest venture capital directions in Silicon Valley and has already run Unicorns such as Cursor, Poolside, Cognition, Magic, Codeium, and Replit are released.

In contrast, major domestic Internet companies and large model manufacturers have basically launched their own "code models", but there are few entrepreneurial projects that have developed well. According to Silicon Star, Qiji Ventures invested in six start-ups in the field of AI programming last year, and almost all of them have been wiped out since then. Most of the more than 10 coding teams that briefly surfaced last year have withdrawn this year.

After the emergence of ChatGPT, Qingliu Capital looked at dozens of projects on the AI programming track, but in the end only Silicon Heart Technology ("aiXcoder") was the one to take action.

For domestic AI programming projects, many people think that the products are relatively "shallow". "There are developers in the community complaining that many products now generate code in a few minutes, but it takes half a day or more to debug." said Liu Daoquan, founder and CEO of Shizhi AI.

Under the appearance of "shallow" products, there are environmental differences formed in the Chinese and American 2B markets over the years. Zhang Hailong analyzed that there are three reasons: the United States has a large group of junior programmers and higher labor costs. The introduction of AI products can help companies significantly reduce costs; the U.S. SaaS market has adopted the PLG model, and companies are more willing to pay for general-purpose products; and foreign countries The exit path of the 2B market is clear, investors have a strong willingness to invest, and the takeover logic in the primary market is also very clear. There are many and very active angel investors. Start-up companies can almost get the first round of funding to verify their ideas.

Zhang Hailong has also worked in the domestic to B market for many years, working in open source communities and SaaS. In his view, the technology wave of large models will not change the current situation of the domestic to B market. "The difference may be that the technology sold has changed. In the era of cloud computing, cloud services were sold, but now AI has arrived and AI is sold," he said.

So this time, he wants to enter the overseas market. However, although Gru is Zhang Hailong’s fourth entrepreneurial venture, it is his first time in silicon.Valley Entrepreneurship. When I first arrived in Silicon Valley, I felt a strong sense of strangeness. "This is the first time that I feel that I don't know anyone in a physical sense," Zhang Hailong said. Throughout 2024, he will spend half of his time in Silicon Valley, actively socializing, participating in various activities, and trying to get to know more people in a shorter period of time.

In September 2024, Gru launched Gru.ai and ranked first with a high score of 45.2% in the swe-bench verified evaluation released by OpenAI. Zhang Hailong obviously felt that after having the product, it would be easier to be accepted in Silicon Valley.

For the domestic B-end market, commonplace problems still exist. "It is more difficult to do to B in China, and the sales chain involved is relatively long. In the end, most of the people who can pay the bill are large companies, but sometimes large companies will buy your products unless they are good." Liu Daoquan said. Fu Rui, investment manager of Qingliu Capital, also said: "Many companies have a large number of security compliance requirements. For example, due to concerns about the risk of information leakage, products that cannot be called from the cloud require locally deployed code tools."

Therefore, domestic AI programming companies must stick their feet in the soil to solve specific problems in various industries.

“Business continuity must be considered during the actual implementation of the model. Judging from the evaluation results, the performance of domestic code models has improved, but in specific application scenarios, specific analysis of specific scenarios is required.” Liu Daoquan said that after previous communication with an industrial manufacturing company, he found that the language used by some software systems in industrial scenarios is not the common python or C++, but some industrial-specific coding tools, which requires technology manufacturers to have targeted products. sexual adjustment.

This is not a unique demand in industrial scenarios. Each industry has its own domain characteristics, and each company has specific business logic and engineering systems, which requires AI programming companies to have stronger fields. service capabilities.

After studying dozens of companies, Fu Rui found: “For various software development needs, in addition to code generation, the functions of AI programming include at least a series of searches, defect detection and repair, testing, etc. tasks; in addition to functions, you also need to consider how to integrate these capabilities with the customer's own business Logical combination allows the model to have deeper domain knowledge, which actually has a high threshold. "

Therefore, Qingliu Capital is more optimistic about the deep coupling of models and products with the company's internal private knowledge, data and software development framework. I voted for aiXcoder in September 2023.

“In this verified demand, aiXcoder is the team that best matches both technically and commercially. At the same time, many key members of the company’s commercial team have been selling to big B customers at home and abroad for more than ten years. experience, and have in-depth insights into customers and the market. In the second quarter of 2023, they proposed a 'domainization' implementation plan, that is, a strategy in which AI programming should be deeply coupled with the company's internal private knowledge, data and software development framework.Judging from the implementation results, it has also been recognized by a large number of leading enterprise customers. "Fu Rui said.

aiXcoder was incubated from the Software Engineering Institute of Peking University. It is the first team in the world to apply deep learning technology to code generation and code understanding, and is also the first team to apply deep learning to programming products. . The team has published in top international journals and conferences He has published more than 100 papers, many of which are the first papers in the field of intelligent software engineering and the most cited papers.

Liu Dexin, business partner and president of aiXcoder, said that it is oriented towards B-side privatized deployment scenarios. At this time, since the general large model has not learned the private domain data, resulting in the model lacking in-depth integration of the enterprise's internal business needs, industry specifications, software development frameworks and operating environments, and failing to incorporate enterprise domain background knowledge such as demand analysis and design documents into model training, resulting in generated or completed codes that The business logic level lacks pertinence and reliability. < /p>

The result is that the accuracy and usability of large models in enterprise applications are lower than expected. “Many large models perform remarkably well in general scenarios or mainstream evaluation sets, and their accuracy rates are remarkable. Up to 30%, but when deployed within an enterprise, the accuracy often plummets to less than 10%. Conventional fine-tuning methods are also difficult to achieve the results expected by enterprises. Therefore, learning and mastering "domain-specific" knowledge is the key to the successful implementation of AI programming systems in enterprises. Solving domain-specific problems for enterprise customers is our differentiated value. Liu Dexin said.

In response to the above pain points, aiXcoder conducts targeted incremental training based on various internal data provided by the company - including code, business documents, requirements documents, design documents, and test documents. As well as industry business terminology and process specifications, industry technical standards and specifications, enterprise technology stacks and programming frameworks, in addition to model training, it also includes multi-Agent, RAG, software development tools and "engineering" that fit the enterprise software development framework. Prompt system" combined from In order to improve the quality of code generation and the entire R&D process,

In terms of delivery form, Liu Dexin said that domain-based solutions are not equivalent to traditional highly customized project-based delivery. extracted from the capabilities and tools to form standardized products and processes and deliver them to customers; at the same time, aiXcoder maintains high-frequency communication with customers through regular meetings, not only assisting customers in solving periodic problems, but also continuously iterating products based on the common real needs of customers.

There have been too many times in the AI industry that "the wolf is coming"

From a results-oriented perspective, No matter whether you go to a small B or a big B, "train the model" or "not train the model", do Copilot or Agent, there may not be an optimal answer, and it all needs to be based on the actual needs of the customer.And it is determined by the entrepreneurial team’s own resource endowment.

No matter which path they take, AI programming companies have a simple and direct goal, which is to improve software development efficiency. However, the current market is still in its early stages, and correctly guiding customer needs is a problem that every company entering the market must face.

Zhang Hailong admitted that the biggest problem currently is how to make customers realize the value of segmented agents. "Even in Silicon Valley, when many potential customers hear new AI products, their first reaction is to be skeptical, not excited. Because one bad thing about the AI track is that there have been too many 'wolf is coming' stories in the past. There are many unusable demos. "Currently, Gru has spent a lot of energy contacting customers and building a reputation among seed users, which will become the basis for large-scale commercialization in the future.

For the domestic market, demanders of AI programming systems must also clarify their own needs and model capability boundaries. "Currently, large model-driven AI programming systems have promising prospects in improving software productivity." Liu Dexin said, "To truly unleash the value of this technology in an enterprise environment, it is necessary to deeply combine the large code model with the enterprise's own domain knowledge. , and continue to iterate and verify in specific business scenarios.”

In fact, with the development of large models today, market sentiment has basically returned to rationality, but noise still exists. For example, in 2024, large-model bidding information will be common, but some of the data may be "misleading."

“The ecological division of labor abroad is relatively clear, but many domestic to B projects will eventually turn into bidding, and many companies are scrambling for bidding.” Liu Daoquan said. However, in the field of AI programming, judging from the public bidding information, even a few major manufacturers have not received many orders.

The reason is that successful bidding does not mean that the model or product can be successfully launched.

On the one hand, in many purchasing parties, the people responsible for purchasing and the people who actually use the products are often not in the same group, which may lead to a double layer between purchasing decisions and actual business needs. On the other hand, these implementations often rely on standardized products and fine-tuning, without in-depth domain training and adaptation to the enterprise's business scenarios and internal logic, which may lead to programmers finding unsatisfactory effects during use. .

An industry insider revealed that most of the orders for hardware in the current bidding market are in the millions, while pure software orders, such as intelligent software development, code assistants and other projects, are mostly around 300,000. Many companies find that they cannot solve the problem after purchasing, so they can only go back to the market to find more suitable manufacturers, resulting in a waste of resources.

However, after eliminating the false and retaining the true, some consensus is also forming. More and more companies are realizing that "decoupling" product and model capabilities is a general trend.

In the first half of 2024, Zhang Hailong realized that when the model capabilities become stronger and stronger, the programming capabilities of each company's models will converge, and the products should no longer be made according to the model capabilities, but should be based on the model capabilities. The product achieves "matching the model""It has nothing to do with it." "Starting from the first half of 2024, we will basically no longer make specific optimizations for different models, but will improve the capabilities of our product architecture. Any model on the market can be connected as long as it passes our benchmark test. "Zhang Hailong said.

Liu Dexin also emphasized: "Enterprise customers should pay full attention to business continuity and should not be tied to any single large model manufacturer. At present, it is difficult to truly meet the needs of enterprise customers for the implementation of large models only by purchasing standardized products. Enterprises need to achieve architectural decoupling in terms of large models, data levels, domainization and engineering, and flexibly choose models and service providers that better suit their needs. The most important thing is to effectively solve the practical problems of domainization of internal software development in enterprises and help enterprises reduce costs and increase efficiency. "

As a third-party perspective in the industry, Liu Daoquan believes that in the future, the access model will only be a part of the industrial implementation. "There are still 100 kilometers from model to application. If technology manufacturers take the first 95-99 kilometers Capabilities are standardized and turned into infrastructure, and the remaining 1-5 kilometers can be done by the application side. ”