Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Q&A Session: Graphing IATA Activity Data for LLM-Powered Chatbots

Q&A Session: Graphing IATA Activity Data for LLM-Powered Chatbots

Interested in setting up a Neo4j graph database to power your chatbot? This Q&A session will provide insight on how XML data can be loaded and stored in a graph database for use by Large Language Model powered conversational AI applications.
Jennifer Reif, a Developer Advocate at Neo4j, will answer questions for students and others interested in loading aid activity information reported in compliance with IATI, an XML standard widely used across the humanitarian community by aid organizations and donors, into a Neo4j graph database. The Q&A session will touch on setting up, formatting and searching the database and cover optimizing the database for traversal by LLMs.

Jennifer Reif

November 28, 2023

More Decks by Jennifer Reif

Other Decks in Technology


  1. Jennifer Reif
    [email protected]
    Neo4j Q&A Session:
    Graphing IATI Aid Activity Data for LLM-Powered Chatbots
    Photo by Igor Omilaev on Unsplash

    View full-size slide

  2. Who Am I?
    Neo4j Developer Advocate
    • Java/JVM technologies

    • Conference speaker

    • Technical blog writer

    • All-around geek

    View full-size slide

  3. Using LLMs
    BEWARE: hallucinations!
    • Send user question to LLM

    • Positives:

    • Natural language response

    • Broad knowledge (crawl the internet)

    • Negatives:

    • Might not have latest data

    • Can hallucinate when unsure

    View full-size slide

  4. IATA
    Format, options, etc
    • Need subscription (free tier available)

    • Single data sets very
    at, non-graph

    • API provides most value

    • API has learning curve (query params)

    • XML, JSON, or CSV responses
    xml:lang="en" default-currency="USD"
    humanitarian="0" hierarchy="1">

    Music high schools Sw-BiH

    Cooperation between the
    Music Academy in Sarajevo and the Royal Music
    High School of Sthlm, the support concerns a
    visit with the aim to plan the cooperation
    during 1999.
    mellan Kungliga Musikhögskolan i Stockholm
    och Musikakademini Sarajevo, bidraget avser
    främst en planeringsresa för samarbetet under


    View full-size slide

  5. Neo4j
    • Schema-

    • Makes refactoring easier and faster

    • Queries with Cypher

    • APOC utility library, for the win!

    • Construct Cypher statements to create
    data according to your model

    View full-size slide

  6. Import
    Headers = 🤕
    • Cloud dbaas (Aura) blocks procs accepting headers

    • Local dbs or alternate hosting required (not Aura)

    • APOC = 🛟

    • apoc.load.xml(url, ‘’, {headers: {abc: blah, def: blah2}})

    • apoc.load.jsonParams(url, {abc: blah, def: blah2}, null)

    View full-size slide

  7. Data Import
    Draft the data model
    • Nodes: Activity, Organization, Sector, …?

    • Relationships - this is the value!

    View full-size slide

  8. Import statement outline
    • Construct the URL and headers

    • UNWIND list of activities (and related properties)

    • Create (MERGE) each activity node

    • UNWIND list of organizations

    • Create (MERGE) each org node

    • Create relationship: Activity<-Organization

    • UNWIND list of sectors

    • Create (MERGE) each sector node

    • Create relationship: Activity->Sector

    View full-size slide

  9. WITH url
    CALL apoc.load.xml(url,"", {headers:{}}) YIELD value

    UNWIND activities as activity
    WITH activity, activity._children as details

    UNWIND titles, descriptions

    CALL apoc.merge.node(["Activity"], {id: id._text}, {title: title._text, description: descr._text}) YIELD node as actNode
    WITH details, actNode
    UNWIND orgs as org

    CALL apoc.merge.node(["Organization"], {ref: org.ref}, {name: name._text}) YIELD node as partOrg

    MERGE (actNode)<-[r:PARTICIPATES_IN]-(partOrg)

    UNWIND sectors as sector

    CALL apoc.merge.node(["Sector"], {code: sector.code}, {name: name._text}) YIELD node as secNode

    MERGE (actNode)<-[r2:OCCURS_IN]->(secNode)
    RETURN *

    View full-size slide

  10. RAG
    Retrieval Augmented Generation
    • Can incorporate recent data

    • Reduce hallucinations

    • Use LLM to style results as natural language

    • Option: use Neo4j as a vector database

    • Vector indexes available

    • Possible chunking for large amounts of text

    • Retrieve entities based on similarity search

    • Prompt engineering may or may not help further

    View full-size slide

  11. Chatbot architecture
    Courtesy: Tomaz Bratanic

    View full-size slide

  12. Chatbot architecture
    Courtesy: Tomaz Bratanic

    View full-size slide

  13. RAG steps (with vector)
    • Put data into Neo4j

    • Create embeddings (OpenAI or other model)

    • Save those embeddings as vectors in Neo4j

    • LLM creates vector for user question

    • Use Neo4j vector index to
    nd similar documents

    • Returns similar documents to user

    View full-size slide

  14. Resources
    • Neo4j GraphAcademy: self-paced courses (2 new LLM courses!)

    • IATA: playground

    • Neo4j APOC: load xml

    • Embeddings: OpenAI docs

    • Neo4j Vectors: search index docs

    • Chatbot examples:

    • Blog post: Knowledge-graph based chatbot

    • Blog post: Context-aware chatbot

    • Blog post: Educational chatbot
    Jennifer Reif
    [email protected]

    View full-size slide