Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Get To Know The Real World: Discovering Connected Data with a Graph Database

Get To Know The Real World: Discovering Connected Data with a Graph Database

Find out what a graph database is and how it can transform your applications and data! We will explore creating, querying, and displaying data and learn how to use simple tools to interact with the database. We will cover the whiteboard-friendly model and the basics of the Cypher query language. Learn how graph databases can improve the data world!

Jennifer Reif

March 13, 2019
Tweet

More Decks by Jennifer Reif

Other Decks in Technology

Transcript

  1. Get to Know the Real World… Discovering Connected Data with

    a Graph Database Jennifer Reif Neo4j @JMHReif
  2. Who Am I? • Developer Relations Engineer for Neo4j •

    Continuous learner • Conference speaker • Blogger • Hobbies: cats, coffee, traveling Email: [email protected] Twitter: @JMHReif
  3. Database - specifically graph • Database: a structured set of

    data held in a computer, especially one that is accessible in various ways. • Relational? NoSQL? Graph? • Graph database: uses graph structures for semantic queries with nodes, edges and properties to represent and store data.
  4. The world is a graph – everything is connected •

    people, places, events • companies, markets • countries, history, politics • sciences, art, teaching • technology, networks, machines, 
 applications, users • software, code, dependencies, 
 architecture, deployments • criminals, fraudsters, and their behavior
  5. What is it used to accomplish? Internal Applications • Master

    Data Management • Network and 
 IT Operations • Fraud Detection Customer-Facing Applications • Real-Time Recommendations • Graph-Based Search • Identity and 
 Access Management
  6. What is it used to accomplish? Use Cases • Social

    networks • Impact analysis • Logistics and routing • Recommendations • Access control • Fraud analysis • …and many, many more!
  7. Whiteboard friendliness The Matrix Cloud Atlas Tom Hanks ACTED_IN Lana

    Wachowski DIRECTED DIRECTED Hugo Weaving ACTED_IN ACTED_IN
  8. Whiteboard friendliness title: Cloud Atlas released: 2012 title: The Matrix

    released: 1999 Movie Movie name: Tom Hanks born: 1956 ACTED_IN roles: Zachry Person Actor name: Lana Wachowski born: 1965 DIRECTED DIRECTED Person Director ACTED_IN roles: Bill Smoke ACTED_IN roles: Agent Smith name: Hugo Weaving born: 1960 Person Actor
  9. Property Graph Data Model • 2 Main Components: • Nodes

    • Relationships • Additional Components: • Labels • Properties
  10. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels Car Person Person
  11. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels • Relationships: • Relate nodes by type and direction Car DRIVES LOVES LOVES LIVES WITH OW NS Person Person
  12. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels • Relationships: • Relate nodes by type and direction • Properties: • Name-value pairs that can be applied to nodes or relationships Car DRIVES LOVES LOVES LIVES WITH OW NS Person Person name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70”
  13. Cypher: Powerful and Expressive CREATE (:Person { name:“Dan”}) -[:LOVES]-> (:Person

    { name:“Ann”}) LOVES Dan Ann LABEL PROPERTY NODE NODE LABEL PROPERTY
  14. Cypher: Powerful and Expressive LOVES Dan Ann MATCH (:Person {

    name:"Dan"} ) -[:LOVES]-> ( whom ) 
 RETURN whom
  15. Options for Importing Data • Cypher statements / script: create

    individual statements to load data manually. • LOAD CSV: used for small and medium data sets can import local or online csv files to graph. • ETL Tool: can import from a relational database and maps relational data model to graph. • APOC: standard library that includes several import procedures for different data formats
  16. //Load Movie objects that are wanted WITH 'https://api.themoviedb.org/3/search/movie?api_key='+ $apiKey+'&query=Lord%20of%20the%20Rings' as

    url CALL apoc.load.json(url) YIELD value UNWIND value.results AS results WITH results MERGE (m:Movie {movieId: results.id})
 ON CREATE SET m.title = results.title, m.desc = results.overview, m.poster = results.poster_path, m.reviewStars = results.vote_average, m.reviews = results.vote_count WITH results, m CALL apoc.do.when(results.release_date = "", 'SET m.releaseDate = null', 'SET m.releaseDate = date(results.release_date)', {m:m, results:results}) YIELD value RETURN m
  17. //For Movie objects just loaded, pick out trilogy and retrieve

    cast of those movies WITH 'https://api.themoviedb.org/3/movie/' as prefix, '/credits?api_key='+$apiKey as suffix, ["The Lord of the Rings: The Fellowship of the Ring", "The Lord of the Rings: The Two Towers", "The Lord of the Rings: The Return of the King"] as movies CALL apoc.periodic.iterate('MATCH (m:Movie) WHERE m.title IN $movies RETURN m', 'WITH m CALL apoc.load.json($prefix+m.movieId+$suffix) YIELD value UNWIND value.cast AS cast MERGE (c:Cast {id: cast.id}) ON CREATE SET c.name = cast.name MERGE (ch:Character {name: cast.character}) MERGE (ch)-[r:APPEARS_IN]->(m) MERGE (c)-[r1:PLAYED]->(ch)', {batchSize: 1, iterateList:false, params:{movies:movies, prefix:prefix, suffix:suffix}});
  18. Resources • Neo4j download: https://neo4j.com/download/ • Neo4j sandbox: https://neo4j.com/sandbox-v2/ •

    Neo4j guides: https://neo4j.com/developer/get-started • Cypher: https://neo4j.com/developer/cypher/ • LOAD CSV: https://neo4j.com/developer/guide-import-csv/ • APOC: https://neo4j-contrib.github.io/neo4j-apoc-procedures/ • Neo4j Certification: https://neo4j.com/graphacademy/neo4j-certification/ @JMHReif [email protected]