Upgrade to Pro — share decks privately, control downloads, hide ads and more …

There and Back Again: A Developer's Tale

There and Back Again: A Developer's Tale

Every developer has had to learn a new technology from hearing about it to research to an end solution. It's not always easy to get started with any new technology and make decisions on import, access, integration, and implementation. Graph provides new capabilities many have not leveraged yet and many developers want to apply it to a need in their companies. Graph is different from other data storage solutions that exist today, so how can we learn about and evaluate decisions to apply this technology in the right way?
In this presentation, we see how to go from no knowledge to an end-to-end solution with Neo4j. Using a data set of the Lord of the Rings movie network, you can learn how to apply each of the process steps and what the result looks like. From learning the model and structure, choosing which environment install, discovering various possibilities for importing data, to actually loading, querying, and presenting the value to others, we show how to make decisions along the way and resources to use in making those decisions. Go from 0 to 60 in building your next solution with a graph database!

Jennifer Reif

June 25, 2019
Tweet

More Decks by Jennifer Reif

Other Decks in Technology

Transcript

  1. Who Am I? • Developer Relations Engineer for Neo4j •

    Continuous learner, developer, blogger • Conference speaker • Survivor of financial industry development Email: [email protected] Twitter: @JMHReif
  2. We want to know… • What actors played which characters

    in Lord of the Rings movies • Other scenarios: • What employees have which skills for job openings in the company • What customers purchased which products and the suppliers • What patient was prescribed which medications from which doctors • What customers bought which vehicles from what dealerships/people
  3. Existing solutions are painful • Thousands of actors, employees, customers,

    patients, doctors, dealerships, skills, etc • Relational: • Great for reports and simple JOINs, but too many JOINs to go across 3 core tables and lookup tables with endless rows each • Document: • Great for pulling information about individual components, but linking properties across substructures is complicated • Key-value: • Great for bits of information very quickly, but aggregating and compiling lots of related data is arduous
  4. Database - specifically graph • Database: a structured set of

    data held in a computer, especially one that is accessible in various ways. • Relational? NoSQL? Graph? • Graph database: uses graph structures for semantic queries with nodes, edges, and properties to represent and store data.
  5. –Wikipedia, “Graph Database”, Performance section “Execution of queries within a

    graph database is localized to a portion of the graph. It does not search through irrelevant data, making it advantageous for real-time big data analytical queries. Consequently, graph database performance is proportional to the size of the data needed to be traversed, staying relatively constant despite the growth of data stored.”
  6. What is it used to accomplish? Use Cases • Social

    networks • Impact analysis • Logistics and routing • Recommendations • Access control • Fraud analysis • …and many, many more!
  7. Neo4j is a database Neo4j Fast Reliable No size limit

    Binary & HTTP protocol ACID transactions 2-4 M
 ops/s per core Clustering scale & availability Official Drivers
  8. Neo4j is a graph database Neo4j Property Graph Model Native

    GraphDB Schema Free Graph Storage Cypher Query Language Developer Workbench Extensible Procedures & Functions Graph Visualization
  9. Property Graph Data Model • 2 Main Components: • Nodes

    • Relationships • Additional Components: • Labels • Properties
  10. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels Car Person Person
  11. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels • Relationships: • Relate nodes by type and direction Car DRIVES LOVES LOVES LIVES WITH OW NS Person Person
  12. Property Graph Data Model • Nodes: • Represent the objects

    in the graph • Can be categorized using Labels • Relationships: • Relate nodes by type and direction • Properties: • Name-value pairs that can be applied to nodes or relationships Car DRIVES LOVES LOVES LIVES WITH OW NS Person Person name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70”
  13. Tools for data modeling… • Arrows tool: • http://www.apcjones.com/arrows/ •

    Developer guides: • https://neo4j.com/developer/data-modeling/ • GraphGists: • https://neo4j.com/graphgists/ • Community Site: • https://community.neo4j.com/ • Training - Data Modeling course: • https://neo4j.com/graphacademy/
  14. Whiteboard friendliness title: The Lord of the Rings… released: 2003

    Movie Cast name: Orlando Bloom name: Frodo Baggins Character PLAYED APPEARS_IN name: Elijah Wood Cast Character name: Legolas Character name: Aragorn name: Viggo Mortensen Cast PLAYED PLAYED APPEARS_IN APPEARS_IN
  15. Options for Importing Data • Cypher statements / script: create

    individual statements to load data manually • LOAD CSV: used for small and medium data sets can import local or online csv files to graph • ETL Tool: can import from a relational database and maps relational data model to graph • Kettle: can import massive amounts of data from a variety of sources • APOC: standard library that includes several import procedures for different data formats • Neo4j-admin import tool: command-line interface for large amounts of data • Import programmatically from drivers: interact via preferred programming language
  16. Tools for Cypher… • Cypher quick-reference: • https://neo4j.com/docs/cypher-refcard/current/ • Developer

    guides: • https://neo4j.com/developer/cypher/ • Cypher manual: • https://neo4j.com/docs/cypher-manual/current/ • Community Site: • https://community.neo4j.com/ • Resources list: • https://neo4j.com/developer/cypher-resources/
  17. Cypher: Powerful and Expressive CREATE (:Person { name:“Dan”}) -[:LOVES]-> (:Person

    { name:“Ann”}) LOVES Dan Ann LABEL PROPERTY NODE NODE LABEL PROPERTY
  18. Cypher: Powerful and Expressive LOVES Dan Ann MATCH (:Person {

    name:"Dan"} ) -[:LOVES]-> ( whom ) 
 RETURN whom
  19. Cypher in 20 sec… • Nodes look like this: •

    (var:Label) OR (var:Label { propKey: propValue }) • Relationships look like this: • -[var:REL_TYPE]-> or -[var:REL_TYPE { propKey: propValue }]- • Using Cypher is just looking for particular patterns of those nodes/rels • (var1:Label)-[var2:REL_TYPE]->(var3:Label)
  20. Cypher statements/script MERGE (m:Movie {id: 100}) ON CREATE SET m.title

    = “The Lord of the Rings: The Fellowship of the Ring”, m.releaseDate = date(‘2001-12-19’)… MERGE (c:Character {id: 300}) ON CREATE SET m.name = “Legolas”… MERGE (c)-[:APPEARED_IN]->(m) ….
  21. LOAD CSV LOAD CSV WITH HEADERS FROM “file:///movies.csv” as row

    MERGE (m:Movie {id: row.movieId}) ON CREATE SET m.title = row.title, m.releaseDate = date(row.released)… …. LOAD CSV WITH HEADERS FROM “file:///movieCharacters.csv” as row MATCH (m:Movie {id: row.movieId}) WITH m, row MERGE (c:Character {id: row.id}) ON CREATE SET m.name = row.name … MERGE (c)-[:APPEARED_IN]->(m) ….
  22. APOC WITH "https://bestmovies.com/" as url CALL apoc.load.json(url) YIELD value UNWIND

    value.results AS results WITH results MERGE (m:Movie {id: results.id}) ON CREATE SET m.title = results.title, m.releaseDate = date(results.released)… ….
  23. APOC fave procs • apoc.load.json(url) / apoc.load.csv(file) / apoc.load.xml(file) /

    apoc.load.jdbc(url) • Procedures to load various kinds of data • Can handle flat files or url paths (locally or remote) • Excellent when you need transformations with data load • apoc.periodic.iterate(‘cypher1’, ’cypher2’, {parms}) • For each result in cypher1 statement, run cypher2 statement on them • Helpful for selecting a segment for update • apoc.do.when(condition, query, else, {parms}) • Handles transformation for substituting values • Used for a variety of functions, but here is good for cleaning data • apoc.date.format(dateType, “precision”, ‘format’) • Can output date in a variety of formats for display or querying • Very helpful pulling or pushing date/time value into/out of Neo4j
  24. Tools for APOC… • Docs: • https://neo4j-contrib.github.io/neo4j-apoc- procedures/ • Developer

    guides: • https://neo4j.com/developer/neo4j-apoc/ • Community Site: • https://community.neo4j.com/ • YouTube videos: • https://www.youtube.com/watch? v=V1DTBjetIfk&list=PL9Hl4pk2FsvXEww23 lDX_owoKoqqBQpdq
  25. Free Tools for Running Neo4j… • Sandbox: • https://neo4j.com/sandbox-v2/ •

    Neo4j Desktop (local instance): • https://neo4j.com/download/ • Server install (open source): • https://neo4j.com/download-center/#community • In the Cloud: • https://neo4j.com/developer/guide-cloud- deployment/ • Docker: • https://hub.docker.com/_/neo4j
  26. //Load Movie objects that are wanted WITH 'https://api.themoviedb.org/3/search/movie?api_key='+ $apiKey+'&query=Lord%20of%20the%20Rings' as

    url CALL apoc.load.json(url) YIELD value UNWIND value.results AS results WITH results MERGE (m:Movie {movieId: results.id})
 ON CREATE SET m.title = results.title, m.desc = results.overview, m.poster = results.poster_path, m.reviewStars = results.vote_average, m.reviews = results.vote_count WITH results, m CALL apoc.do.when(results.release_date = "", 'SET m.releaseDate = null', 'SET m.releaseDate = date(results.release_date)', {m:m, results:results}) YIELD value RETURN m
  27. //For Movie objects just loaded, pick out trilogy and retrieve

    cast of those movies WITH 'https://api.themoviedb.org/3/movie/' as prefix, '/credits?api_key='+$apiKey as suffix, ["The Lord of the Rings: The Fellowship of the Ring", "The Lord of the Rings: The Two Towers", "The Lord of the Rings: The Return of the King"] as movies CALL apoc.periodic.iterate('MATCH (m:Movie) WHERE m.title IN $movies RETURN m', 'WITH m CALL apoc.load.json($prefix+m.movieId+$suffix) YIELD value UNWIND value.cast AS cast MERGE (c:Cast {id: cast.id}) ON CREATE SET c.name = cast.name MERGE (ch:Character {name: cast.character}) MERGE (ch)-[r:APPEARS_IN]->(m) MERGE (c)-[r1:PLAYED]->(ch)', {batchSize: 1, iterateList:false, params:{movies:movies, prefix:prefix, suffix:suffix}});
  28. Other ways to query and explore • Make calls from

    an application • Neo4j drivers for almost any programming language • Java, Python, Javascript, Go, Ruby, PHP • Visualization tools • Open source and proprietary • Neovis, Browser, Bloom, 3d-force-graph, Kineviz, yWorks
  29. Will it play nice? • Integrations, integrations, integrations! • Out-of-the-box

    plugins (APOC, GraphQL, graph algorithms) • Custom extensions possible • Tons of options for feeding data to existing tools/systems • Tableau, Kettle, Kafka, ElasticSearch, other DBs, Spark, and many more
  30. What can I show to others? • Neo4j Bloom (or

    partner/open source visualization tools) • Exploration tool for business users to query with natural language • Basic reports and query performance • Build according to specs and compare solutions, just as you would with any technology evaluation • Use cases and success stories • https://neo4j.com/resources • Possible integrations and minimal interruption of existing systems • What tools are you using today? Does our integration fit neatly? • Community and support network! • Support agreement or fabulous expert community answers to questions