Hallucination-Free Zone: LLMs + Graph Databases got your back!

Hallucination-free zone: LLMs + Graph Databases got your back! Photo
by fabio on Unsplash Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Who is Jennifer Reif? Developer Advocate, Neo4j • Continuous learner
• Conference speaker • Tech blogger • Other: geek Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

What is a hallucination?

Generative Arti fi cial Intelligence Artificial intelligence capable of generating
text, images, or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics. https://en.wikipedia.org/wiki/Generative_arti fi cial_intelligence

How do hallucinations happen? LLM limitations • Lacking most recent
data • Not always natural language • Language complexities, sarcasm, emotion • No sources • Hallucinations / Temperature • IP, bias, privacy

“Simply add an LLM” doesn’t work…

Strategies to improve LLM accuracy • Custom model • Fine-tuning
/ Few-shot learning • Retrieval Augmented Generation (RAG) • All of these involve training an LLM on speci fi c data!

RAG Pull data from external data sources • Retrieval •
Data retrieved from database • Augmented • Augments response with facts • Generation • Response in natural language

Database options • Vector database • Relational (+ vectors) •
NoSQL (+ vectors) • Graph (+ vectors)

Vector databases Example: Pinecone, 2019 (not the fi rst) •
Pros: • Fast retrieval of highly-dimensional data • Similarity searches based on vectors (not speci fi c values) • Cons: • Requires a lot of infrastructure and power • Cannot store much outside vector+metadata

Graph databases Example: Neo4j, 2007 • Pros: • Flexible, agile
data model • Relationships stored with entities (JOIN operations visual) • Cons: • Storing relationships creates some write overhead • No/Little bene fi t for low-connected data

What is a graph?

What is a graph? Leonard Euler - graph theory •
Started with a math problem • Looking for a “better way” to handle certain problems Seven Bridges of Konigsberg problem. Leonhard Euler, 1735

What does it solve? Data problems • Documented path (not
just data) • Answering how and why • Understanding/ fi nd hidden connections • Find alternates, impacts, etc. • Graphs add context + meaning

Use cases • Recommendations • Social • Supply chain •
Fraud detection • GenAI (grounding/RAG) • Many more!

Data storage with relationships! TL;DR • Stores relationships with entities
• Produces faster read queries (for JOINs) • Easily connect multiple entities together • Mimic real-world data organization

Let’s build one…

Book domain • Find authors with reviews for multiple books
• Find similar users based on reviews of books and related authors

Property graph • Node (vertex) • Relationship (edge)

Nodes • Represent objects or entities • Can be labeled
• May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17

Relationships • Must have a type (label) • Must have
a direction • May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17 AUTHORED WRITTEN_FOR date_added: “Sun Jan 03”

Applying RAG to an LLM

LLMs take text, not databases And context window limit

What is a vector? Mathematical realm • Line in space
• Has length and direction horizontal vertical

Vectors in the physical realm https://www.mathsisfun.com/algebra/vectors.html

Vector arithmetic C = a + b 1 a b
2 a b 3 a + b

Vectors in the technical realm Kings and Queens king −
man + woman ≈ queen king man wom an 1 king man wom an 2 queen? 3

Vectors to compare things What makes things similar? Length Width

Embeddings Convert data to a point in space • Series
of numbers • 100s or 1000s of dimensions • Dimension = interesting feature / characteristic

LLMs take text, not databases Vectors

How do we search the vectors? Similarity search • Expensive
queries (compare to every vector) • Approximate nearest neighbor (k-ANN) • Example: Library • Book classi fi cation - genre vs location of plot • Smaller search set = smaller retrieval time! Photo by Martin Adams on Unsplash

Provide prompt + context to LLM Vectors Text

RAG architecture • Retrieval • Data retrieved from database •
Augmented • Augments response with facts • Generation • Response in natural language Prompt + Relevant Information LLM API LLM  Chat API User Database Search Prompt Response Relevant Results / Documents 2 3 1 Database

Agentic Workflow Architecture • Uses “agents”/tools • LLM determines next
step • Which tool/external source should be called • Uses result from tool as context Prompt + Relevant Information LLM API LLM  Chat API User Tool Prompt Response Relevant Results / Documents 2 3 1 Source info

Nothing is a silver bullet LLM is (of sorts) mind
of its own • Can’t guarantee a consistent answer • Prompt engineering • Context window limits

How much value can RAG add?

Explainable AI With RAG + LLM • How did the
LLM get this answer? • Graphs: • Generate LLM response from knowledge in database • Set security rules in graph (what’s viewable) • Retrieve extra data connected to similar entities Photo by No Revisions on Unsplash

Demo! Our data model

Resources • Github repository (today’s code): github.com/JMHReif/springai-goodreads • GraphAcademy LLM
courses: graphacademy.neo4j.com/categories/llms/ • Docs for Spring AI: docs.spring.io/spring-ai/reference/api/vectordbs/neo4j.html • NODES 2024: neo4j.com/nodes2024/agenda Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Hallucination-Free Zone: LLMs + Graph Databases...

Hallucination-Free Zone: LLMs + Graph Databases got your back!

More Decks by Jennifer Reif

Other Decks in Technology

Featured

Transcript