Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Connect the Tables: How to Graph Relational Dat...

Connect the Tables: How to Graph Relational Databases

You might have heard about graphs, but what if you already have a relational database? How do you incorporate something new? This session will show you how to translate part of all data from a relational format into a graph.

We will cover the basics of migrating the model and practical steps for actually implementing it. We will walk through naming and case conventions for graphs, as well as some principles for a successful graph model. At each step, we will show live code for making this work in the real world. Come to this session to get concrete steps for graphing relational databases.

Code: https://github.com/JMHReif/connect-the-tables

Jennifer Reif

August 15, 2024
Tweet

More Decks by Jennifer Reif

Other Decks in Technology

Transcript

  1. Photo by Shannon Potter on Unsplash Jennifer Reif Developer Advocate,

    Neo4j Connect the Tables: How to Graph Relational Databases [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif
  2. Who is Jennifer Reif? Developer Advocate, Neo4j • Continuous learner

    • Conference speaker • Creator: Tech blogger + podcaster • Other: geek Jennifer Reif [email protected] @JMHReif github.com/JMHReif jmhreif.com linkedin.com/in/jmhreif
  3. Kaggle - public data Co ff ee Shop https://www.kaggle.com/datasets/ylchang/co ff

    ee-shop-sample-data-1113 https://github.com/JMHReif/graphing-relational-dbs-data-import/tree/main Cleaned —>>
  4. Demo 1 - Look at CSVs Live! • customer.csv •

    order.csv • product.csv • sales_outlet.csv • sta ff .csv • product_type.csv
  5. Data Normalization 3rd Normal Form (3NF) • Unique row, low

    data duplication • Primary key or composite key (where necessary) • Foreign keys to link tables • Many-to-many relationships -> JOIN tables • JOINs on query-time, from expert knowledge https://dzone.com/articles/do-we-need-data-normalization-anymore
  6. What is a graph? Connected data • Store relationships ->

    separate entities • JOINs on write, not read • Flexible schema • Query language: Cypher
  7. Still get core features No FOMO here :) • Physical

    storage • Transactional + analytical workloads • ACID transactions • Indexes / constraints • Security features • Cloud hosting • Free -> enterprise solutions
  8. Nodes Graph components • Represent objects or entities • Can

    be labeled • May have properties Order Product Employee orderId: 162468 orderDate: 2019-04-01 productId: 08746589 name: “Ipoh Coffee” id: 247924 startDate: 2016-05-04 position: “Barista”
  9. Relationships Graph components • Must have a type • Must

    have a direction • May have properties • Nodes can have multiple Order Product Employee orderId: 162468 orderDate: 2019-04-01 productId: 08746589 name: “Ipoh Coffee” id: 247924 startDate: 2016-05-04 position: “Barista” CONTAINS date: 2022-08-16 tip: 1.00 SOLD CONTACT CREATED
  10. Label Graph components • A group of nodes • Like

    a category Person Employee Customer
  11. Property Graph components • Key-value pair • Associated with node

    or relationship • Like a variable name and value Order Employee orderId: 162468 id: 247924 SOLD orderDate: 2019-04-01 date: 2022-08-16 tip: 1.00 startDate: 2016-05-04 position: “Barista”
  12. Naming conventions Relational vs Graph • Recommendations vary! • Lowercase

    • Separator: _ • Avoid keywords • Capitalize SQL keywords • Might change with GQL standard • camelCase • UPPERCASE • Separator: _ • Avoid keywords • Capitalize Cypher keywords https://brainstation.io/learn/sql/naming-conventions https://neo4j.com/docs/cypher-manual/current/syntax/naming/ SQL Cypher
  13. Relational -> Graph Data model • Tables -> Labels •

    Row = Node • FK = Relationship • JOIN table = Relationship
  14. • Tables = Labels • Row = Node • FK

    = Relationship • JOIN table = Relationship https://arrows.app
  15. • Tables = Labels • Row = Node • FK

    = Relationship • JOIN table = Relationship
  16. What to do… When things are not cut and dry?

    • Property on node or relationship? • Promo • Property or node? • Line item • More than 1 label? • Seasonal drink
  17. Tips and Tricks General architecture • De fi ne use

    case -> • Know what problem you’re solving • Outline key queries -> • Optimize model to answer those • Limitations are di ff erent • More entities <> bad • Model can change as-needed
  18. Tips and Tricks General practices • Frequent fi lter ->

    • Opt for separate node/rel • Props with few/unchanging values -> • Consider label • Dense nodes (hypernodes) -> • Consider more relationship types to fi lter
  19. Other options for both Export + Import + Sync •

    Driver • Messaging platform • Scheduled task
  20. Setup Load components • Neo4j Docker container • Shell script

    -> cypher-shell • Cypher script • LOAD CSV
  21. Polyglot Graphy data -> graph db • Store data with

    complicated JOINs in a graph • E.g. Customer -> Order -> Product • Eval: • JOINs • Frequency • Performance criticality
  22. Cypher • Functional and visual • Based on ASCII-art •

    Declarative query language • Focuses on what to retrieve • Not how A B LIKES MATCH (A)-[:LIKES]->(B)
  23. Cypher: powerful and expressive Jennifer Neo4j WORKS_FOR CREATE (:Person {

    name: ‘Jennifer’}) -[:WORKS_FOR]-> (:Company { name: ‘Neo4j’}) NODE PROPERTY NODE PROPERTY LABEL LABEL
  24. Translating queries • SELECT • Property list • FROM <table>

    • JOIN <table> ON <property> • WHERE • GROUP BY • HAVING • MATCH • RETURN entities/props • Pattern (node)-[rel]-(node2) • -[var:REL_TYPE]-> • WHERE • Implicit in RETURN • WHERE on aggregations SQL Cypher
  25. Ex 1 - Find products SQL Cypher MATCH (p:Product) RETURN

    p LIMIT 10; product Product SELECT * FROM product LIMIT 10;
  26. Ex 2 - Find staff working in each building SQL

    Cypher MATCH (b:Building)<-[r]-(s:Staff) RETURN *; building Building SELECT b.building_id, b.building_type, b.street_address, b.city, s.staff_id, s.first_name, s.last_name, s.position FROM building b, staff s WHERE CAST(s.building_id AS INTEGER) = b.building_id; staff Staff
  27. Ex 3 - Find staff (NOT managers) for each building

    SQL Cypher MATCH (b:Building)<-[r:WORKS_IN]-(s:Staff) RETURN *; building Building SELECT b.building_id, b.building_type, b.street_address, b.city, s.staff_id, s.first_name, s.last_name, s.position FROM building b, staff s WHERE CAST(s.building_id AS INTEGER) = b.building_id AND s.staff_id <> b.manager_staff_id; staff Staff
  28. Ex 4 - Find orders a staff member sold SQL

    Cypher MATCH (s:Staff {staffId: 13})-[r:SOLD]->(o:Order) RETURN * LIMIT 10; SELECT s.staff_id, s.first_name, s.last_name, o.transaction_id, o.transaction_date, o.transaction_time FROM staff s, coffee_order o WHERE s.staff_id = 13 AND o.staff_id = s.staff_id LIMIT 10; staff Staff coffee_order Order
  29. Ex 5 - Find customers purchased >1 latte SQL Cypher

    MATCH (c:Customer)-[r:BOUGHT]->(o:Order)-[r2:CONTAINS]->(p:Product {productName: “Latte”}) WITH c, COUNT(p) as numberOrdered WHERE numberOrdered > 1 RETURN c.customerId, c.name, numberOrdered ORDER BY numberOrdered DESC; coffee_order product order_line_item Order Product SELECT c.customer_id, c.customer_name, COUNT(p.product_name) AS number_ordered FROM customer c INNER JOIN coffee_order o ON c.customer_id = o.customer_id INNER JOIN order_line_item ol ON o.transaction_id = ol.transaction_id AND o.transaction_date = ol.transaction_date AND o.transaction_time = ol.transaction_time INNER JOIN product p ON p.product_id = ol.product_id WHERE p.product_name = “Latte” GROUP BY c.customer_id HAVING number_ordered > 1 ORDER BY number_ordered DESC; customer Customer
  30. Relational vs Graph Designed di ff erently • DDL /

    DML • Strict schema + data types • Structure-enforced • Data integrity • Storage/query optimization • Changes = dependency impacts • Schema- fl exible • Restrictions set as-needed • Governance-enforced • Data integrity • Storage/query optimization • Changes = less cascading impacts Relational Graph
  31. Tools For refactoring • Cypher + APOC • Neo4j Migrations

    (https://neo4j.com/labs/neo4j-migrations/) • 3rd party tools (Kettle, etc)
  32. Resources Questions? • Github repo: github.com/JMHReif/connect-the-tables • GraphAcademy: graphacademy.neo4j.com/ •

    NODES 2024: https://dev.neo4j.com/nodes24 Jennifer Reif jmhreif.com github.com/JMHReif [email protected] @JMHReif linkedin.com/in/jmhreif