Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Q&A Session: Graphing IATA Activity Data for LLM-Powered Chatbots

Q&A Session: Graphing IATA Activity Data for LLM-Powered Chatbots

Interested in setting up a Neo4j graph database to power your chatbot? This Q&A session will provide insight on how XML data can be loaded and stored in a graph database for use by Large Language Model powered conversational AI applications.
Jennifer Reif, a Developer Advocate at Neo4j, will answer questions for students and others interested in loading aid activity information reported in compliance with IATI, an XML standard widely used across the humanitarian community by aid organizations and donors, into a Neo4j graph database. The Q&A session will touch on setting up, formatting and searching the database and cover optimizing the database for traversal by LLMs.

Jennifer Reif

November 28, 2023
Tweet

More Decks by Jennifer Reif

Other Decks in Technology

Transcript

  1. Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif Neo4j Q&A Session:

    Graphing IATI Aid Activity Data for LLM-Powered Chatbots Photo by Igor Omilaev on Unsplash
  2. Who Am I? Neo4j Developer Advocate • Java/JVM technologies •

    Conference speaker • Technical blog writer • All-around geek
  3. Using LLMs BEWARE: hallucinations! • Send user question to LLM

    • Positives: • Natural language response • Broad knowledge (crawl the internet) • Negatives: • Might not have latest data • Can hallucinate when unsure
  4. IATA Format, options, etc • Need subscription (free tier available)

    • Single data sets very fl at, non-graph • API provides most value • API has learning curve (query params) • XML, JSON, or CSV responses <iati-activity last-updated- datetime="2023-11-20T07:19:47+02:00" xml:lang="en" default-currency="USD" humanitarian="0" hierarchy="1"> <iati-identifier>SE-0-SE-6-7600051401- BIH-16061</iati-identifier> <reporting-org ref="SE-0" type="10" secondary-reporter="0"> <narrative>Sweden</narrative> </reporting-org> <title> <narrative>Music high schools Sw-BiH</ narrative> <narrative xml:lang="sv">Musikhögskolor Sw-BiH</narrative> </title> <description type="1"> <narrative>Cooperation between the Music Academy in Sarajevo and the Royal Music High School of Sthlm, the support concerns a visit with the aim to plan the cooperation during 1999.</narrative> <narrative xml:lang="sv">Samarbete mellan Kungliga Musikhögskolan i Stockholm och Musikakademini Sarajevo, bidraget avser främst en planeringsresa för samarbetet under 1999.</narrative> </description> https://developer.iatistandard.org/api-details#api=datastore&operation=query
  5. Neo4j Schema-free • Schema- fl exible • Makes refactoring easier

    and faster • Queries with Cypher • APOC utility library, for the win! • Construct Cypher statements to create data according to your model
  6. Import Headers = 🤕 • Cloud dbaas (Aura) blocks procs

    accepting headers • Local dbs or alternate hosting required (not Aura) • APOC = 🛟 • apoc.load.xml(url, ‘’, {headers: {abc: blah, def: blah2}}) • apoc.load.jsonParams(url, {abc: blah, def: blah2}, null) https://neo4j.com/docs/apoc/current/overview/apoc.load/
  7. Data Import Draft the data model • Nodes: Activity, Organization,

    Sector, …? • Relationships - this is the value!
  8. Import statement outline • Construct the URL and headers •

    UNWIND list of activities (and related properties) • Create (MERGE) each activity node • UNWIND list of organizations • Create (MERGE) each org node • Create relationship: Activity<-Organization • UNWIND list of sectors • Create (MERGE) each sector node • Create relationship: Activity->Sector
  9. WITH url CALL apoc.load.xml(url,"", {headers:{<headers>}}) YIELD value … UNWIND activities

    as activity WITH activity, activity._children as details … UNWIND titles, descriptions … CALL apoc.merge.node(["Activity"], {id: id._text}, {title: title._text, description: descr._text}) YIELD node as actNode WITH details, actNode UNWIND orgs as org … CALL apoc.merge.node(["Organization"], {ref: org.ref}, {name: name._text}) YIELD node as partOrg … MERGE (actNode)<-[r:PARTICIPATES_IN]-(partOrg) … UNWIND sectors as sector … CALL apoc.merge.node(["Sector"], {code: sector.code}, {name: name._text}) YIELD node as secNode … MERGE (actNode)<-[r2:OCCURS_IN]->(secNode) RETURN * https://neo4j.com/labs/apoc/4.1/overview/apoc.merge/apoc.merge.node/
  10. RAG Retrieval Augmented Generation • Can incorporate recent data •

    Reduce hallucinations • Use LLM to style results as natural language • Option: use Neo4j as a vector database • Vector indexes available • Possible chunking for large amounts of text • Retrieve entities based on similarity search • Prompt engineering may or may not help further
  11. RAG steps (with vector) User->LLM->Neo4j->LLM->User • Put data into Neo4j

    • Create embeddings (OpenAI or other model) • Save those embeddings as vectors in Neo4j • LLM creates vector for user question • Use Neo4j vector index to fi nd similar documents • Returns similar documents to user https://neo4j.com/developer-blog/building-educational-chatbot-neo4j/
  12. Resources • Neo4j GraphAcademy: self-paced courses (2 new LLM courses!)

    • IATA: playground • Neo4j APOC: load xml • Embeddings: OpenAI docs • Neo4j Vectors: search index docs • Chatbot examples: • Blog post: Knowledge-graph based chatbot • Blog post: Context-aware chatbot • Blog post: Educational chatbot Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif