Neo4j’s Stephen Chin believes that a graph is the most intuitive and efficient data structure for storing, searching, and querying data with complex relationships. In a conversation with OSFY’s Yashasvini Razdan, the VP of Developer Relations at Neo4j spoke about the applications of graphical databases in AI and LLM-based solutions…
Q. What industry trends are driving the need for graph technology?
A. There is a major gap between the number of AI applications successfully put into production and those meeting the intended requirements. The primary challenge lies in whether large language models (LLMs) and applications deliver accurate results, avoid hallucinations, and effectively utilise enterprise data and systems. Knowledge graphs bridge this gap by providing additional context and information, avoiding the ‘black box’ problem, and providing a smooth development experience without struggling with models that aren’t well-suited to the task. Research studies have shown that combining graph databases and LLMs can improve the accuracy of LLM results by 50%. This is significant because one of the biggest hurdles to widespread adoption of LLM technology is ensuring accurate, explainable results that can be confidently used in business contexts, especially for mission-critical applications.
By using knowledge graphs, organisations can transition from AI theories and prototypes to building production-ready applications. These applications not only deliver reliable results but also ensure data meets the needs of both internal and external stakeholders with confidence.
Q. So where does open source come into play in all of this?
A. Neo4j has evolved into a highly successful commercial and enterprise database but remains firmly rooted in its open source origins. It is entirely built on open source with an open-sourced database. We have an open source Community Edition, which is free for anyone to use. Without the power of an open source ecosystem, graph technology would never have taken off.
Q. What is the advantage of using open source AI solutions compared to proprietary solutions?
A. I would say the ability to construct a best-of-breed AI system. You can deploy it entirely on-premises, host it in the cloud, or use base models like ChatGPT or similar systems. You can opt for open source models that you host yourself, entirely behind your firewall.
This flexibility allows you to select the best components for building your AI solution, tailoring it to meet your specific data privacy requirements, data storage locations, and business needs. An open source graph database enables the creation of an end-to-end application framework to achieve this.
One example of our collaboration in this space is with the Linux Foundation’s AI and Data Foundation. We are part of the OPEA project (Open Platform for Enterprise AI), an entirely open source framework for building generative AI applications powered by knowledge graphs.
Q. What are the open source software platforms and partnerships that are helping you expand through the community?
A. I’d say the open source projects I mentioned earlier, the open source foundations we’re collaborating with, and even our own graph RAG (retrieval-augmented generation) gathering—a collective of different companies and individuals driving innovation and research in graph technologies—are all helping to shape the future.
Together, they’re paving the way for building an AI/ML platform and framework that will provide a stronger foundation for companies to base their enterprise LLMs on, in the future.
Q. What kind of adoption trends are you seeing when it comes to India versus the globe?
A. In terms of adoption, India closely tracks the US, Europe, and other markets, as a major portion of India’s technology ecosystem is linked to global organisations, with developers collaborating across the world.
Speaking at Open Source India 2024 has led me to meet many old colleagues and friends from San Francisco, the hub where companies like LangChain, Olama, OpenAI, and others driving innovation first emerged. India feels like a very similar tech ecosystem because the same people and organisations are now operating globally.
Q. How does GraphRAG work?
A. The easiest way to think about GraphRAG is as a RAG (retrieval-augmented generation) architecture. When a question comes in from an LLM, you would want to retrieve information from the base model and access information from your internal enterprise systems—data that is not publicly available or included in the base model’s training.
This process involves taking your documents—structured or unstructured, such as JSON files, PDFs, or tables—and feeding them into a knowledge graph generator. This generator creates a knowledge graph and also generates vector embeddings.
When a user submits a query, the system queries the graph database, retrieving additional context. It extracts a relevant portion of the knowledge graph, which is then passed as context to the LLM. The LLM uses this additional context to provide a more informed answer.
Q. Why not use purely generative models? Why use a combination of knowledge graphs and LLMs?
A. A purely generative AI solution excels at generating answers but often lacks specificity and accuracy. It may suffice for internal systems, such as tools to assist customer support representatives, but it is entirely unacceptable for customer-facing systems. By augmenting LLMs with a knowledge graph, you can raise the bar significantly—achieving much higher accuracy, along with explainable answers, showing exactly which portions of the knowledge graph were used to generate the results. In cases where the system fails to retrieve factual or relevant information, the query can be redirected to a human operator to ensure the question is properly addressed.
Q. How does a knowledge graph remain up to date with the latest information?
A. Knowledge graphs can be easily updated. You can run batch processes to add additional documents regularly or run queries to dynamically update the knowledge graph. There are many ways to quickly and efficiently keep the knowledge graph up to date with the latest information.
Knowledge graphs have been around for a long time, so it is well-established how to maintain them, build highly available graphs, and perform high-performance queries against them. This makes them a great complement to expert systems built on LLMs, as these systems require up-to-date information. In enterprise use cases, it’s essential that the knowledge graph is highly available—always operational and accessible to customers.
Q. What challenges do you encounter when integrating a knowledge graph with an LLM and how are you resolving them?
A. I think a knowledge graph does a reasonable job of creating nodes and relationships, and you can define the labels or schemas you want to use to generate it. This makes it a great starting point for organisations just beginning with this technology. Once people start building systems that combine knowledge graphs with LLMs, there are several approaches for integrating the two, depending on the use case. For example, you can use Text-to-Cypher, which translates the initial query into Cypher (the query language for knowledge graphs) to retrieve results from the knowledge graph.
Another approach that many organisations have found successful is using a vector database alongside the knowledge graph. Neo4j supports this, and you can link vector embeddings to the knowledge graph. When querying the vector database, relevant nodes and their related nodes are pulled back, enhancing the dataset passed to the LLM.
Some organisations use the knowledge graph as a prioritisation engine to improve the quality of the results and re-rank the results returned from the vector embedding side.
Q. What kind of industries or what kind of customers need this solution?
A. There are many different industries that can use or are already using this approach. For example, we have organisations in customer service applications using it very effectively. We’ve also seen it applied in research analysis—one of our customers uses it for research on supply chain issues and oil pipelines, leveraging an LLM to identify potential problems. Additionally, this approach is commonly deployed for fraud detection use cases.
Knowledge graphs are the industry standard for fraud detection, but the range of scenarios where knowledge graphs + LLMs can provide significant advantages is vast. This includes areas like cybersecurity, banking, manufacturing, and supply chain management.
Q. How is the market in India for your solution?
A. In India, we have numerous customers ranging from consulting firms and banks to technology companies. During the presentation I gave at Open Source India 2024, many attendees were familiar with Neo4j or are using it for internal company projects. Most financial companies are already using Neo4j, particularly for fraud detection use cases.
Developers are increasingly adopting Neo4j for various application development use cases because it is much more intuitive to work with knowledge graphs and LLMs compared to other types of databases, which are not as well-suited for LLM use cases.
Q. How equipped are knowledge graphs to be scalable and flexible for different customer requirements?
A. I think knowledge graphs are very broad and scalable, making them suitable for a variety of use cases. At its core, a knowledge graph is simply a collection of nodes and relationships, with the ability to run queries. We also have data science extensions built on top of it. However, beyond that, you can create very powerful and specific applications for a wide range of industries.
It’s like how relational databases have become the core for most time-based or tabular data, which is organised in rows and columns. Graph databases, on the other hand, are a natural format for anything that involves networks or relationship-based document structures, which are closely aligned with what you need when building AI/ML applications and working with generative AI technology.
Q. How does the knowledge graph know that this is the relevant information it needs to extract?
A. There are many effective algorithms in knowledge graphs for evaluating closeness and the proximity of different nodes, as well as their relationships. With a well-constructed knowledge graph, you can efficiently perform multi-level hops around the graph to retrieve relevant information that is contextually useful for an LLM or an application to generate results. The same structure in tables and joins would be difficult to construct and would also require dozens of table joins to achieve the same results, leading to very slow performance. In contrast, this is a natural representation for knowledge graphs and graph-based applications.
Q. Can GraphRAG be applied to other sector domains like data analytics and business intelligence, apart from GenAI?
A. Yes. Graph databases have a wide range of applications and use cases beyond GenAI and LMs. For instance, Neo4j’s data science package enables users to gain insights and perform in-depth analysis of their graph data.
Many of our customers use graph databases for fraud prevention and supply chain optimisation, among other purposes. In fact, graph databases are often the optimal choice for these kinds of applications. Historically, graph databases were foundational for building recommendation engines, and they remain widely used for that purpose today.
Every time you search on Google or a major search engine, a graph database is working behind the scenes to deliver results. This is because algorithms like PageRank and other recommendation engine technologies were originally developed as graph science applications.
Q. Are there any metrics to measure if the solution obtained with a combination of a knowledge graph and LLM is the best or the most effective solution?
A. Measuring the responses from LLMs can be challenging. Knowledge graphs, however, offer the advantage of providing specific nodes and portions of the graph that are passed into the model, making the process less of a black box. This allows you to analyse the results, identify the data that was passed in, and evaluate how effectively the model delivers accurate responses.
Recent research papers have demonstrated that the combination of knowledge graphs and LLMs offers a lot of benefits. This approach not only enhances research outcomes but also proves advantageous across various industries.
Q. What does the future look like for knowledge graphs?
A. This year has been marked by the rise of generative AI and RAG, but moving forward, GraphRAG and knowledge graphs will represent the next evolution, delivering improved accuracy and greater explainability in the results. Neo4j stands out as the best open source and commercial graph database for addressing these use cases.
Our goal is to empower the developer community to effectively leverage generative AI combined with LLMs and graph use cases. We are committed to providing valuable content, as well as offering more hands-on learning opportunities.














































































