Slide 1

Slide 1 text

Hallucination-free zone: LLMs + Graph Databases got your back! Photo by fabio on Unsplash Jennifer Reif jennifer.reif@neo4j.com @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Slide 2

Slide 2 text

Who is Jennifer Reif? Developer Advocate, Neo4j • Continuous learner • Conference speaker • Tech blogger • Other: geek Jennifer Reif jennifer.reif@neo4j.com @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Slide 3

Slide 3 text

What is a hallucination?

Slide 4

Slide 4 text

Generative Arti fi cial Intelligence Artificial intelligence capable of generating text, images, or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics. https://en.wikipedia.org/wiki/Generative_arti fi cial_intelligence

Slide 5

Slide 5 text

How do hallucinations happen? LLM limitations • Lacking most recent data • Not always natural language • Language complexities, sarcasm, emotion • No sources • Hallucinations / Temperature • IP, bias, privacy

Slide 6

Slide 6 text

“Simply add an LLM” doesn’t work…

Slide 7

Slide 7 text

Strategies to improve LLM accuracy • Custom model • Fine-tuning / Few-shot learning • Retrieval Augmented Generation (RAG) • All of these involve training an LLM on speci fi c data!

Slide 8

Slide 8 text

RAG Pull data from external data sources • Retrieval • Data retrieved from database • Augmented • Augments response with facts • Generation • Response in natural language

Slide 9

Slide 9 text

Database options • Vector database • Relational (+ vectors) • NoSQL (+ vectors) • Graph (+ vectors)

Slide 10

Slide 10 text

Vector databases Example: Pinecone, 2019 (not the fi rst) • Pros: • Fast retrieval of highly-dimensional data • Similarity searches based on vectors (not speci fi c values) • Cons: • Requires a lot of infrastructure and power • Cannot store much outside vector+metadata

Slide 11

Slide 11 text

Graph databases Example: Neo4j, 2007 • Pros: • Flexible, agile data model • Relationships stored with entities (JOIN operations visual) • Cons: • Storing relationships creates some write overhead • No/Little bene fi t for low-connected data

Slide 12

Slide 12 text

What is a graph?

Slide 13

Slide 13 text

What is a graph? Leonard Euler - graph theory • Started with a math problem • Looking for a “better way” to handle certain problems Seven Bridges of Konigsberg problem. Leonhard Euler, 1735

Slide 14

Slide 14 text

What does it solve? Data problems • Documented path (not just data) • Answering how and why • Understanding/ fi nd hidden connections • Find alternates, impacts, etc. • Graphs add context + meaning

Slide 15

Slide 15 text

Use cases • Recommendations • Social • Supply chain • Fraud detection • GenAI (grounding/RAG) • Many more!

Slide 16

Slide 16 text

Data storage with relationships! TL;DR • Stores relationships with entities • Produces faster read queries (for JOINs) • Easily connect multiple entities together • Mimic real-world data organization

Slide 17

Slide 17 text

Let’s build one…

Slide 18

Slide 18 text

Book domain • Find authors with reviews for multiple books • Find similar users based on reviews of books and related authors

Slide 19

Slide 19 text

Property graph • Node (vertex) • Relationship (edge)

Slide 20

Slide 20 text

Nodes • Represent objects or entities • Can be labeled • May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17

Slide 21

Slide 21 text

Relationships • Must have a type (label) • Must have a direction • May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17 AUTHORED WRITTEN_FOR date_added: “Sun Jan 03”

Slide 22

Slide 22 text

Applying RAG to an LLM

Slide 23

Slide 23 text

LLMs take text, not databases And context window limit

Slide 24

Slide 24 text

What is a vector? Mathematical realm • Line in space • Has length and direction horizontal vertical

Slide 25

Slide 25 text

Vectors in the physical realm https://www.mathsisfun.com/algebra/vectors.html

Slide 26

Slide 26 text

Vector arithmetic C = a + b 1 a b 2 a b 3 a + b

Slide 27

Slide 27 text

Vectors in the technical realm Kings and Queens king − man + woman ≈ queen king man wom an 1 king man wom an 2 queen? 3

Slide 28

Slide 28 text

Vectors to compare things What makes things similar? Length Width

Slide 29

Slide 29 text

Embeddings Convert data to a point in space • Series of numbers • 100s or 1000s of dimensions • Dimension = interesting feature / characteristic

Slide 30

Slide 30 text

LLMs take text, not databases Vectors

Slide 31

Slide 31 text

How do we search the vectors? Similarity search • Expensive queries (compare to every vector) • Approximate nearest neighbor (k-ANN) • Example: Library • Book classi fi cation - genre vs location of plot • Smaller search set = smaller retrieval time! Photo by Martin Adams on Unsplash

Slide 32

Slide 32 text

Provide prompt + context to LLM Vectors Text

Slide 33

Slide 33 text

RAG architecture • Retrieval • Data retrieved from database • Augmented • Augments response with facts • Generation • Response in natural language Prompt + Relevant Information LLM API LLM
 Chat API User Database Search Prompt Response Relevant Results / Documents 2 3 1 Database

Slide 34

Slide 34 text

Agentic Workflow Architecture • Uses “agents”/tools • LLM determines next step • Which tool/external source should be called • Uses result from tool as context Prompt + Relevant Information LLM API LLM
 Chat API User Tool Prompt Response Relevant Results / Documents 2 3 1 Source info

Slide 35

Slide 35 text

Nothing is a silver bullet LLM is (of sorts) mind of its own • Can’t guarantee a consistent answer • Prompt engineering • Context window limits

Slide 36

Slide 36 text

How much value can RAG add?

Slide 37

Slide 37 text

Explainable AI With RAG + LLM • How did the LLM get this answer? • Graphs: • Generate LLM response from knowledge in database • Set security rules in graph (what’s viewable) • Retrieve extra data connected to similar entities Photo by No Revisions on Unsplash

Slide 38

Slide 38 text

Demo! Our data model

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

Resources • Github repository (today’s code): github.com/JMHReif/springai-goodreads • GraphAcademy LLM courses: graphacademy.neo4j.com/categories/llms/ • Docs for Spring AI: docs.spring.io/spring-ai/reference/api/vectordbs/neo4j.html • NODES 2024: neo4j.com/nodes2024/agenda Jennifer Reif jennifer.reif@neo4j.com @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif