Slide 1

Slide 1 text

Pass or Play What does GenAI mean for the Java developer? Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Slide 2

Slide 2 text

Who is Jennifer Reif? Developer Advocate, Neo4j • Continuous learner • Conference speaker • Tech blogger • Other: geek Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif

Slide 3

Slide 3 text

Photo by Matt Walsh on Unsplash AI Vector RAG LLM Algorithm Chaining DS Entity resolution Knowledge graph ML NLP GenAI Hallucination Embedding k-ANN Cosine similarity Euclidean distance Fine-tune Few-shot Grounding Model Prompt Semantic search Similarity Temperature Tokens Natural language ChatGPT Chatbot Context window

Slide 4

Slide 4 text

Generative Arti fi cial Intelligence Artificial intelligence capable of generating text, images, or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics. https://en.wikipedia.org/wiki/Generative_arti fi cial_intelligence

Slide 5

Slide 5 text

Large Language Model (LLM)

Slide 6

Slide 6 text

General info… About LLMs • Lots of data • Answers on probabilities • Training takes tons of hardware, money, time • Models • Di ff erent providers / companies train their own

Slide 7

Slide 7 text

And here enters…

Slide 8

Slide 8 text

ChatGPT Public interface to LLM • Chat Generative Pre-trained Transformer • OpenAI, Nov 2022 • Natural language response • Predict next word • Feedback / reward to rank responses • Use cases: professional, personal, everything!

Slide 9

Slide 9 text

Is it all that? Worth the hype? • General information • Public domain knowledge • Historical data • Creative / arts • Human assistant • Task delegation Photo by Igor Omilaev on Unsplash

Slide 10

Slide 10 text

LLM issues • Lacking most recent data • Not always natural language • Language complexities, sarcasm, emotion • No sources • Hallucinations / Temperature • IP, bias, privacy

Slide 11

Slide 11 text

Improving LLM accuracy

Slide 12

Slide 12 text

Approaches • Custom model • Fine-tuning / Few-shot learning • Retrieval Augmented Generation (RAG)

Slide 13

Slide 13 text

RAG Pull data from external data sources • Retrieval • Data retrieved from database • Augmented • Augments response with facts • Generation • Response in natural language Prompt + Relevant Information LLM API LLM
 Chat API User Database Search Prompt Response Relevant Results / Documents 2 3 1 Database

Slide 14

Slide 14 text

Explainable AI With RAG + LLM • How did the LLM get this answer? • Grounding LLM answer with veri fi ed data Photo by No Revisions on Unsplash

Slide 15

Slide 15 text

Vectors and Databases

Slide 16

Slide 16 text

What is a vector? • Length • Direction • Components have meaning horizontal vertical

Slide 17

Slide 17 text

How do we use vectors? https://www.mathsisfun.com/algebra/vectors.html

Slide 18

Slide 18 text

Vector arithmetic C = a + b 1 a b 2 a b 3 a + b

Slide 19

Slide 19 text

Example Kings and Queens king − man + woman ≈ queen king man wom an 1 king man wom an 2 queen? 3

Slide 20

Slide 20 text

Vectors to compare things What makes things similar? Length Width

Slide 21

Slide 21 text

Embeddings Convert data to a point in space • Series of numbers • 100s or 1000s of dimensions • Dimension = interesting feature / characteristic

Slide 22

Slide 22 text

Similarity search Vector indexes • Expensive queries (compare to every vector) • Approximate nearest neighbor (k-ANN) • Example: Library • Book classi fi cation - author vs location of plot • Smaller search set = smaller retrieval time! Photo by Martin Adams on Unsplash

Slide 23

Slide 23 text

Choosing a database

Slide 24

Slide 24 text

Vector databases Example: Pinecone, 2019 (not fi rst) • Pros: • Fast retrieval of highly-dimensional data • Similarity searches based on vectors (not speci fi c values) • Cons: • Requires a lot of infrastructure and power • Cannot store much outside vector+metadata

Slide 25

Slide 25 text

Graph databases Example: Neo4j, 2007 • Pros: • Flexible, agile data model • Relationships stored with entities (visual JOINs) • Cons: • Write overhead to store relationships • No bene fi t for low-connected data (relational)

Slide 26

Slide 26 text

(Knowledge) Graphs in GenAI Bene fi ts • Generate LLM response from knowledge in database • Security rules in database (what’s viewable) • Structured + unstructured data side-by-side • Vector dbs only store unstructured • Retrieve connected entities • Connected = extra, relevant context Combine for more accurate results within a relevant context. Knowledge Graph Similarity Search Find similar documents. Vector Index Find related information. Graph Structure Pattern Matching

Slide 27

Slide 27 text

Nothing is a silver bullet LLM is (of sorts) mind of its own • Can’t guarantee a consistent answer • Prompt engineering • Context window limits

Slide 28

Slide 28 text

Demo! Our data model

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

Resources • Github repository (today’s code): github.com/JMHReif/springai-goodreads • GraphAcademy LLM courses: graphacademy.neo4j.com/categories/llms/ • Docs for Spring AI: docs.spring.io/spring-ai/reference/api/vectordbs/neo4j.html • NODES 2024: dev.neo4j.com/nodes24 Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif