Generative AI depends on data to build responses to user queries. Training large language models (LLMs) uses huge volumes of data—for example, OpenAI’s GPT-3 used the CommonCrawl data set, which stood at 570 gigabytes and 400 billion tokens. But these data sets, while massive, are snapshots in time that cannot respond to queries around events happening today. AI responses can also include hallucinations, where information is provided that seems plausible but is not real. According to Vectara’s Hallucination Leaderboard, even the best performing family of LLMs (currently OpenAI’s) have hallucination rates in the range of 1.5 to 1.9 percent.
Using LLMs on their own therefore faces two problems: the answers can be out of date, and the responses can be wrong. To overcome these potential problems, companies can use data streaming to get new information into their data set, and deploy retrieval-augmented generation (RAG) to encode business data in a way that can be used with generative AI.
RAG creates a set of data that can be searched for relevant semantic matches to a user query, and those matches are then shared with the LLM for inclusion in the response. The vector data set can have new or additional data added to it over time, so relevant and timely data is available for inclusion in responses.
RAG challenges
However, while RAG enables companies to use their own data with generative AI services, it is not perfect. One challenge that has come up in deploying RAG into production environments is that it does not handle searches across lots of documents that contain similar or identical information. When these files are chunked and turned into vector embeddings, each one will have its data available for searching. When each of those files has very similar chunks, finding the right data to match that request is harder. RAG can also struggle when the answer to a query exists across a number of documents that cross reference each other. RAG is not aware of the relationships between these documents.
For example, imagine you have implemented a chatbot service that can call on your product data to answer customer queries. You have turned your catalog of widgets into vector data, but those widgets are all very similar. When your customer queries the chatbot, how can you make sure that the response you deliver is accurate, even with RAG in play? What if your catalog contains links to other documents with additional context? Making a recommendation or serving up a query that is inaccurate will affect that customer interaction.
The answer to this is to look at a different knowledge management approach to complement what RAG does well. Microsoft Research put together a research report on using knowledge graphs and RAG together using a technique called GraphRAG earlier this year.
Rather than storing data in rows and columns for traditional searches, or as embeddings for vector search, a knowledge graph represents data points as nodes and edges. A node will be a distinct fact or characteristic, and edges will connect all the nodes that have relevant relationships to that fact. In the example of a product catalog, the nodes may be the individual products while the edges will be similar characteristics that each of those products possess, like size or color.
Sending a query to a knowledge graph involves looking for all the relevant entities to that search, and then creating a knowledge sub-graph that brings all those entities together. This retrieves the relevant information for the query, which can then be returned back to the LLM and used to build the response. This means that you can deal with the problem of having multiple similar data sources. Rather than treating each of these sources as distinct and retrieving the same data multiple times, the data will be retrieved once.
Using a knowledge graph with RAG
To use a knowledge graph with your RAG application, you can either use an existing knowledge graph with data that is tested and known to be correct in advance, or create your own. When you are using your own data—such as your product catalog—you will want to curate the data and check that it is accurate.
You can use your own generative AI approach to help you achieve this. LLMs are built to extract information from content and summarize that data when you need it. For a knowledge graph, this can be automated to build the data in the right format, as well as supporting any updates or changes to the graph over time as you add more data overall. There are multiple tools on the popular LangChain service that can interrogate files then provide knowledge graphs including LLMGraphTransformer and Diffbot, while the knowledge extraction tool REBEL is another option.
For dedicated graph analytics projects, you may want to adopt a full graph database that can run full queries using graph languages like Gremlin and Cipher. However, for supporting knowledge graph requests as part of RAG applications, you will only need to run small searches that cover two or three nodes at a time. This means that your requests will normally be expressed as a few rounds of simple queries (one for each step) or a SQL join. Carrying out searches across larger data sets is not likely to return the right responses—in fact, it can lead to runaway queries that take too long to process or don’t actually improve your overall responses.
Storing your knowledge graph data can therefore be done using your existing database rather than an additional graph database deployment. This also simplifies the operational side around your data, as you can reduce the number of data platforms that you have to keep up to date with new data over time.
Combining knowledge graphs with RAG can improve the accuracy of your generative AI application when it responds to user queries. By combining different data management techniques, you can get the best of both worlds when it comes to data performance and semantic understanding in your requests.
Dom Couldwell is head of field engineering, EMEA, at DataStax.
—
Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.