Revolution in Natural Language Processing

A research team composed of representatives from the Netrix S.A. Research and Development Center, the WSEI Academy in Lublin, and the Lublin University of Technology have developed an innovative GraphRAG tool that sets a new direction in the area of information extraction from unstructured texts. In the research results published in October 2024, the authors presented an approach combining large language models (LLM) with knowledge graphs, which allows for more precise, consistent, and complex answers to user questions, even in the case of distributed data sources.

*Comparison of knowledge graphs generated for fragments of 300 and 1200 tokens – smaller fragments provide more entities and relations.*

One of the key aspects of GraphRAG’s effectiveness is the optimized way of splitting the input text into smaller fragments. The analysis showed that processing texts in blocks of 300 tokens, compared to larger units (1200 tokens), significantly improves the identification of entities and relations, which directly translates into the accuracy and detail of the generated knowledge graphs. This level of data granularity enables more effective extraction of key information from documents and better organization within a graph structure.

An important element of the developed solution is the application of the Leiden algorithm, known for its high efficiency in detecting communities in large datasets. This allows for the automatic grouping of semantically related entities and the creation of hierarchical structures of dependencies. This type of information organization supports both local analysis and the synthesis of knowledge on a global scale, making GraphRAG a useful tool in decision support processes.

GraphRAG effectively overcomes the limitations of traditional Retrieval-Augmented Generation systems, which often generated inconsistent or incomplete responses when queries required the integration of multiple sources. The inclusion of knowledge processing represented in graph form significantly increases the relevance and precision of responses, especially where capturing context, relations between entities, and their hierarchical organization is critical.

This approach has great practical potential in project management, where the analysis of documentation, reports, or communication logs requires quick and reliable synthesis of information. Thanks to the scalability of the solution and the ability to automatically generate summaries for individual communities in the knowledge graph, GraphRAG accelerates analytical processes and supports efficient decision-making in environments with large volumes of textual data.

In terms of further development, GraphRAG technology could find applications in the automation of legal, scientific, and business analyses, as well as in the development of intelligent language assistants and advanced customer service systems. The integration of language models with semantic knowledge representations is a step towards more understandable, contextual, and explainable artificial intelligence systems.

The full version of the publication is available at:
https://ersj.eu/journal/3497