Slide 1

Slide 1 text

Hallucination-free zone: LLMs + Graph Databases got your back! Photo by fabio on Unsplash Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Slide 2

Slide 2 text

Who is Jennifer Reif? Developer Advocate, Neo4j • Continuous learner • Conference speaker • Tech blogger • Other: geek Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Slide 3

Slide 3 text

Database catalog

Slide 4

Slide 4 text

Relational databases (RDBMS) Example: Oracle, 1979 • Pros: • Slices of data easily assembled with queries • Low data duplication (unique rows) • Cons: • Relationships assembled with JOINS (NF) • Strict model

Slide 5

Slide 5 text

Document databases Example: MongoDB, 2007 • Pros: • Handles varied (unstructured) data more easily • Group related info in a single entity • Cons: • Little fl exibility (relationships pre-baked in doc) • Data duplication "customer" : { "id": "123", "firstName" : "Jane", "lastName" : "Doe", "DOB" : "03/12/1989", "department" : "Engineering", "phoneNumbers" : { { "type" : "office", "number" : "650-123-4567" } { "type" : "cell", "number" : "650-321-7654" } } "title" : "Director of QA" }

Slide 6

Slide 6 text

Vector databases Example: Pinecone, 2019 (not the fi rst) • Pros: • Fast retrieval of highly-dimensional data • Similarity searches based on vectors (not speci fi c values) • Cons: • Requires a lot of infrastructure and power • Cannot store much outside vector+metadata

Slide 7

Slide 7 text

Graph databases Example: Neo4j, 2007 • Pros: • Flexible, agile data model • Relationships stored with entities (JOIN operations visual) • Cons: • Storing relationships creates some write overhead • No bene fi t over relational for low-connected data

Slide 8

Slide 8 text

What is a graph?

Slide 9

Slide 9 text

What is a graph? Leonard Euler - graph theory • Started with a math problem • Looking for a “better way” to handle certain problems Seven Bridges of Konigsberg problem. Leonhard Euler, 1735

Slide 10

Slide 10 text

What does it solve? Data problems • Documented path (not just data) • Answering how and why • Understanding/ fi nd hidden connections • Find alternates, impacts, etc. • Graphs add context + meaning

Slide 11

Slide 11 text

Use cases • Recommendations • Social • Supply chain • Fraud detection • GenAI (grounding/RAG) • Many more!

Slide 12

Slide 12 text

Data storage with relationships! TL;DR • Stores relationships with entities • Produces faster read queries (for JOINs) • Easily connect multiple entities together • Mimic real-world data organization

Slide 13

Slide 13 text

Let’s build one…

Slide 14

Slide 14 text

Book domain • Find authors with reviews for multiple books • Find similar users based on reviews of books and related authors

Slide 15

Slide 15 text

Property graph • Node (vertex) • Relationship (edge)

Slide 16

Slide 16 text

Nodes • Represent objects or entities • Can be labeled • May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17

Slide 17

Slide 17 text

Relationships • Must have a type (label) • Must have a direction • May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17 AUTHORED WRITTEN_FOR date_added: “Sun Jan 03”

Slide 18

Slide 18 text

What is Vector Search?

Slide 19

Slide 19 text

What is a vector? • Length • Direction • Components have meaning horizontal vertical

Slide 20

Slide 20 text

What makes things similar? Shapes Length Width

Slide 21

Slide 21 text

Vector arithmetic c = a + b 1 a b 2 a b 3 a + b

Slide 22

Slide 22 text

Example Kings and Queens king − man + woman ≈ queen king man wom an 1 king man wom an 2 queen? 3

Slide 23

Slide 23 text

What are vector embeddings? Convert something to a point in space • Same concepts, applied to data formats • 100s or 1000s of dimensions • Dimension = interesting feature/characteristic

Slide 24

Slide 24 text

Vector index Why index? • Queries become expensive • Need to compare every vector to query • Indexes = speed • Jump right to where you need (like index in a book) • Approximate nearest neighbor (k-ANN) • e.g. 20 closest vectors to this one

Slide 25

Slide 25 text

Vector index Example: Library • Categorizing books by author or genre • Embeddings can hold more complex information • Further categories: • “gender of main character” • “main location of plot” • Indexing can retrieve a smaller portion of all available vectors • Reducing retrieval time!

Slide 26

Slide 26 text

Neo4j Vector Search What’s the value? • Allow to store structured + unstructured data side-by-side • Other vector dbs only store unstructured data • Power is in the connected entities to the vector search results • Connected = extra, relevant context Combine for more accurate results within a relevant context. Knowledge Graph Similarity Search Find similar documents. Vector Index Find related information. Graph Structure Pattern Matching

Slide 27

Slide 27 text

How to Add Vectors …to existing data

Slide 28

Slide 28 text

Data model Books

Slide 29

Slide 29 text

Data + Vectors How do I get vectors for existing data? • Generate some vector embeddings • Happens externally, several models available • Store embedding as property on a node

Slide 30

Slide 30 text

Examples OpenAI

Slide 31

Slide 31 text

Use LLM with Graph Vector Search

Slide 32

Slide 32 text

RAG Pull data from external data source • Retrieval • Data retrieved from database • Augmented • Augments response with facts • Generation • Response in natural language Prompt + Relevant Information LLM API LLM
 Chat API User Database Search Prompt Response Relevant Results / Documents 2 3 1 Database

Slide 33

Slide 33 text

How much value can RAG add?

Slide 34

Slide 34 text

Explainable AI With RAG + LLM • How did the LLM get this answer? • Graphs: • Generate LLM response from knowledge in database • Set security rules in graph (what’s viewable) • Retrieve extra data connected to similar entities

Slide 35

Slide 35 text

Demo Time!

Slide 36

Slide 36 text

Resources • Github repository (today’s code): github.com/JMHReif/springai-goodreads • GraphAcademy LLM courses: graphacademy.neo4j.com/categories/llms/ • Docs for Spring AI: docs.spring.io/spring-ai/reference/api/vectordbs/neo4j.html • Docs for OpenAI embeddings: platform.openai.com/docs/guides/embeddings Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif