Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hallucination-Free Zone: LLMs + Graph Databases...

Hallucination-Free Zone: LLMs + Graph Databases got your back!

Hallucinations refer to the generation of contextually plausible but incorrect or fabricated information, demonstrating the model's capacity to produce imaginative and contextually coherent yet inaccurate outputs. Large Language Models (LLMs) can provide answers that sound realistic to almost any question, even if those answers are entirely made up. With a graph database, you can anchor an LLM in reality and mitigate the risk of generating false information or unauthorized access to sensitive data. This prevents the model from producing inaccurate responses and ensures a more reliable and secure outcome.
This presentation will show you the benefits of graph databases over regular databases and how to use AI tools to eliminate LLM hallucinations, enforce security, and improve accuracy. We will also discuss why a vector index can provide better, smarter, faster results than a pure vector database.
Code: https://github.com/JMHReif/springai-goodreads
Event: https://www.meetup.com/javasig/events/303951313/?eventOrigin=group_past_events

Jennifer Reif

October 23, 2024
Tweet

More Decks by Jennifer Reif

Other Decks in Technology

Transcript

  1. Hallucination-free zone: LLMs + Graph Databases got your back! Photo

    by fabio on Unsplash Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif
  2. Who is Jennifer Reif? Developer Advocate, Neo4j • Continuous learner

    • Conference speaker • Tech blogger • Other: geek Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif
  3. Generative Arti fi cial Intelligence Artificial intelligence capable of generating

    text, images, or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics. https://en.wikipedia.org/wiki/Generative_arti fi cial_intelligence
  4. How do hallucinations happen? LLM limitations • Lacking most recent

    data • Not always natural language • Language complexities, sarcasm, emotion • No sources • Hallucinations / Temperature • IP, bias, privacy
  5. Strategies to improve LLM accuracy • Custom model • Fine-tuning

    / Few-shot learning • Retrieval Augmented Generation (RAG) • All of these involve training an LLM on speci fi c data!
  6. RAG Pull data from external data sources • Retrieval •

    Data retrieved from database • Augmented • Augments response with facts • Generation • Response in natural language
  7. Vector databases Example: Pinecone, 2019 (not the fi rst) •

    Pros: • Fast retrieval of highly-dimensional data • Similarity searches based on vectors (not speci fi c values) • Cons: • Requires a lot of infrastructure and power • Cannot store much outside vector+metadata
  8. Graph databases Example: Neo4j, 2007 • Pros: • Flexible, agile

    data model • Relationships stored with entities (JOIN operations visual) • Cons: • Storing relationships creates some write overhead • No/Little bene fi t for low-connected data
  9. What is a graph? Leonard Euler - graph theory •

    Started with a math problem • Looking for a “better way” to handle certain problems Seven Bridges of Konigsberg problem. Leonhard Euler, 1735
  10. What does it solve? Data problems • Documented path (not

    just data) • Answering how and why • Understanding/ fi nd hidden connections • Find alternates, impacts, etc. • Graphs add context + meaning
  11. Use cases • Recommendations • Social • Supply chain •

    Fraud detection • GenAI (grounding/RAG) • Many more!
  12. Data storage with relationships! TL;DR • Stores relationships with entities

    • Produces faster read queries (for JOINs) • Easily connect multiple entities together • Mimic real-world data organization
  13. Book domain • Find authors with reviews for multiple books

    • Find similar users based on reviews of books and related authors
  14. Nodes • Represent objects or entities • Can be labeled

    • May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17
  15. Relationships • Must have a type (label) • Must have

    a direction • May have properties Book Author title: “Star Wars” isbn: 9756165498 name: “George Lucas” avgRating: 4.72 Review rating: 4.2 reviewText: “Blah” votes: 17 AUTHORED WRITTEN_FOR date_added: “Sun Jan 03”
  16. What is a vector? Mathematical realm • Line in space

    • Has length and direction horizontal vertical
  17. Vectors in the technical realm Kings and Queens king −

    man + woman ≈ queen king man wom an 1 king man wom an 2 queen? 3
  18. Embeddings Convert data to a point in space • Series

    of numbers • 100s or 1000s of dimensions • Dimension = interesting feature / characteristic
  19. How do we search the vectors? Similarity search • Expensive

    queries (compare to every vector) • Approximate nearest neighbor (k-ANN) • Example: Library • Book classi fi cation - genre vs location of plot • Smaller search set = smaller retrieval time! Photo by Martin Adams on Unsplash
  20. RAG architecture • Retrieval • Data retrieved from database •

    Augmented • Augments response with facts • Generation • Response in natural language Prompt + Relevant Information LLM API LLM
 Chat API User Database Search Prompt Response Relevant Results / Documents 2 3 1 Database
  21. Agentic Workflow Architecture • Uses “agents”/tools • LLM determines next

    step • Which tool/external source should be called • Uses result from tool as context Prompt + Relevant Information LLM API LLM
 Chat API User Tool Prompt Response Relevant Results / Documents 2 3 1 Source info
  22. Nothing is a silver bullet LLM is (of sorts) mind

    of its own • Can’t guarantee a consistent answer • Prompt engineering • Context window limits
  23. Explainable AI With RAG + LLM • How did the

    LLM get this answer? • Graphs: • Generate LLM response from knowledge in database • Set security rules in graph (what’s viewable) • Retrieve extra data connected to similar entities Photo by No Revisions on Unsplash
  24. Resources • Github repository (today’s code): github.com/JMHReif/springai-goodreads • GraphAcademy LLM

    courses: graphacademy.neo4j.com/categories/llms/ • Docs for Spring AI: docs.spring.io/spring-ai/reference/api/vectordbs/neo4j.html • NODES 2024: neo4j.com/nodes2024/agenda Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif