Hallucination-Free Zone: LLMs + Graph Databases got your back!

Hallucination-free zone: LLMs + Graph Databases got your back! Photo
by fabio on Unsplash Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Who is Jennifer Reif? Developer Advocate, Neo4j • Continuous learner
• Conference speaker • Tech blogger • Other: geek Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Database catalog

Relational databases (RDBMS) Example: Oracle, 1979 • Pros: • Slices
of data easily assembled with queries • Low data duplication (unique rows) • Cons: • Relationships assembled with JOINS (NF) • Strict model

Document databases Example: MongoDB, 2007 • Pros: • Handles varied
(unstructured) data more easily • Group related info in a single entity • Cons: • Little fl exibility (relationships pre-baked in doc) • Data duplication "customer" : { "id": "123", "firstName" : "Jane", "lastName" : "Doe", "DOB" : "03/12/1989", "department" : "Engineering", "phoneNumbers" : { { "type" : "office", "number" : "650-123-4567" } { "type" : "cell", "number" : "650-321-7654" } } "title" : "Director of QA" }

Vector databases Example: Pinecone, 2019 (not the fi rst) •
Pros: • Fast retrieval of highly-dimensional data • Similarity searches based on vectors (not speci fi c values) • Cons: • Requires a lot of infrastructure and power • Cannot store much outside vector+metadata

Graph databases Example: Neo4j, 2007 • Pros: • Flexible, agile
data model • Relationships stored with entities (JOIN operations visual) • Cons: • Storing relationships creates some write overhead • No bene fi t over relational for low-connected data

What is a graph?

What is a graph? Leonard Euler - graph theory •
Started with a math problem • Looking for a “better way” to handle certain problems Seven Bridges of Konigsberg problem. Leonhard Euler, 1735

What does it solve? Data problems • Documented path (not
just data) • Answering how and why • Understanding/ fi nd hidden connections • Find alternates, impacts, etc. • Graphs add context + meaning

Use cases • Recommendations • Social • Supply chain •
Fraud detection • GenAI (grounding/RAG) • Many more!

Data storage with relationships! TL;DR • Stores relationships with entities
• Produces faster read queries (for JOINs) • Easily connect multiple entities together • Mimic real-world data organization

Let’s build one…

Book domain • Find authors with reviews for multiple books
• Find similar users based on reviews of books and related authors

Property graph • Node (vertex) • Relationship (edge)

Nodes • Represent objects or entities • Can be labeled
• May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17

Relationships • Must have a type (label) • Must have
a direction • May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17 AUTHORED WRITTEN_FOR date_added: “Sun Jan 03”

What is Vector Search?

What is a vector? • Length • Direction • Components
have meaning horizontal vertical

What makes things similar? Shapes Length Width

Vector arithmetic c = a + b 1 a b
2 a b 3 a + b

Example Kings and Queens king − man + woman ≈
queen king man wom an 1 king man wom an 2 queen? 3

What are vector embeddings? Convert something to a point in
space • Same concepts, applied to data formats • 100s or 1000s of dimensions • Dimension = interesting feature/characteristic

Vector index Why index? • Queries become expensive • Need
to compare every vector to query • Indexes = speed • Jump right to where you need (like index in a book) • Approximate nearest neighbor (k-ANN) • e.g. 20 closest vectors to this one

Vector index Example: Library • Categorizing books by author or
genre • Embeddings can hold more complex information • Further categories: • “gender of main character” • “main location of plot” • Indexing can retrieve a smaller portion of all available vectors • Reducing retrieval time!

Neo4j Vector Search What’s the value? • Allow to store
structured + unstructured data side-by-side • Other vector dbs only store unstructured data • Power is in the connected entities to the vector search results • Connected = extra, relevant context Combine for more accurate results within a relevant context. Knowledge Graph Similarity Search Find similar documents. Vector Index Find related information. Graph Structure Pattern Matching

How to Add Vectors …to existing data

Data model Books

Data + Vectors How do I get vectors for existing
data? • Generate some vector embeddings • Happens externally, several models available • Store embedding as property on a node

Examples OpenAI

Use LLM with Graph Vector Search

RAG Pull data from external data source • Retrieval •
Data retrieved from database • Augmented • Augments response with facts • Generation • Response in natural language Prompt + Relevant Information LLM API LLM  Chat API User Database Search Prompt Response Relevant Results / Documents 2 3 1 Database

How much value can RAG add?

Explainable AI With RAG + LLM • How did the
LLM get this answer? • Graphs: • Generate LLM response from knowledge in database • Set security rules in graph (what’s viewable) • Retrieve extra data connected to similar entities

Demo Time!

Resources • Github repository (today’s code): github.com/JMHReif/springai-goodreads • GraphAcademy LLM
courses: graphacademy.neo4j.com/categories/llms/ • Docs for Spring AI: docs.spring.io/spring-ai/reference/api/vectordbs/neo4j.html • Docs for OpenAI embeddings: platform.openai.com/docs/guides/embeddings Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Hallucination-Free Zone: LLMs + Graph Databases...

Hallucination-Free Zone: LLMs + Graph Databases got your back!

More Decks by Jennifer Reif

Other Decks in Technology

Featured

Transcript