In July this year, Microsoft open sourced the super-large knowledge index GraphRAG for the first time. In just over 4 months, it has surpassed GraphRAG on Github. With 19,000 stars, it has become one of the most popular RAG frameworks currently.
However, the cost of GraphRAG is very high when processing global data queries, especially when applied to AI models with large parameters. There are also problems such as delays and inaccuracies in the query process.
Early this morning, Microsoft Research released an iterative version of GraphRAG - LazyGraphRAG. One of the biggest highlights of this RAG is that the cost is very low. The data indexing cost is only 0.1% of the full version of GraphRAG. At the same time, it adopts a new hybrid data search method, which generates results with better accuracy and efficiency. An open source version will be released soon and Added to GraphRAG library.
Open source address: https://github.com/microsoft/graphrag?tab=readme-ov-file
The following "AIGC Open Community" will be based on the content released by Microsoft's official blog , explain the technical differences of LazyGraphRAG in detail, and review GraphRAG at the same time.
LazyGraphRAG technical features
In the data indexing stage, Microsoft's previously open source GraphRAG mainly relied on large models to extract and describe entities and their relationships, and generated data for each entity and relationship. Summarize.
This process involves graph statistics to optimize the entity graph and extract the hierarchical community structure. However, the cost of this method is very high, because it requires a large amount of language model processing, making GraphRAG's data indexing cost very, very expensive.
Unlike GraphRAG, LazyGraphRAG does not perform any pre-summarization or embedding generation during the data indexing stage. Instead, it uses NLP noun phrase extraction to identify concepts and their co-occurrence relationships, and then optimizes them through graph statistics. concept map, and extract hierarchical community structure. This makes LazyGraphRAG's indexing cost extremely low, only 0.1% of GraphRAG. In other words, the cost is reduced by 1000 times.
In terms of query processing, GraphRAG uses breadth-first search to ensure that the breadth of the entire data set is considered when answering queries, while LazyGraphRAG combines the dynamics of best-first search and breadth-first search, using iterative deepening. Way. Text snippets are first ranked by similarity and then query results are gradually refined by dynamically selecting relevant communities.
This approach enables LazyGraphRAG to support local and global queries while efficiently findingBest matching text block.
In terms of flexibility and scalability, GraphRAG can be used for a variety of purposes due to its rich summary information, but its high cost limits its use in one-time queries and exploratory analysis. LazyGraphRAG provides a unified query interface, supports local and global queries, is very flexible, and is suitable for one-time queries, exploratory analysis and streaming data usage scenarios.
In terms of application scenarios, GraphRAG is suitable for scenarios that require high-quality and comprehensive query results, such as enterprise-level knowledge management and complex data analysis. LazyGraphRAG is suitable for cost-sensitive scenarios that require efficient processing of global queries, such as content recommendation systems and project management tools for small and medium-sized enterprises and individual developers, which is very friendly to those with limited resources.
LazyGraphRAG test data
To evaluate the performance of LazyGraphRAG, Microsoft set three different budgets to observe its performance under different conditions.
At the lowest budget level, 100 correlation tests, and using low-cost large models, LazyGraphRAG shows significant advantages, outperforming all other methods on both local and global queries. .
In local queries, LazyGraphRAG significantly surpasses methods such as C1, C2, C3_Dynamic, LS, DRIFT, SS_8K, SS_64K and RAPTOR. Although GraphRAG global search sometimes performs better in global queries, LazyGraphRAG still has the advantage in terms of cost-effectiveness.
When the budget level is increased to 500 times, and more advanced large models are used, the advantages of LazyGraphRAG are further revealed. It costs only 4% of the C2 level, yet performs significantly better than all other conditions, including the C2 level GraphRAG global search.
This shows that LazyGraphRAG not only has advantages in cost, but also performs well in query quality, providing higher quality answers both in local and global queries.
When the high budget of 1500 times is reached, the advantages of LazyGraphRAG further increase. LazyGraphRAG's performance on local and global queries continues to improve, especially in global queries, where its winning rate is significantly higher than other methods.
Even under high budget conditions, LazyGraphRAG still maintains its dual advantages of cost-effectiveness and query quality.