Slide 1

Slide 1 text

Graph databases and analysis presents using Neo4j Akash Tandon Data Engineering@SocialCops Email: [email protected] Twitter: analyticalmonk

Slide 2

Slide 2 text

Goal - Learn why graph databases are relevant - Understand advantages and challenges related to working with Graph DBs - Introduction to Neo4j graph database, cypher query language and Py2Neo python package

Slide 3

Slide 3 text

History The Konigsberg problem Solved by Leonhard Euler Laid the foundation of graph theory Ref: Wikipedia

Slide 4

Slide 4 text

Why and what? Ref: Cambridge semantic blog

Slide 5

Slide 5 text

Use-cases - Fraud detection - Knowledge graphs - Recommendation systems - Investigative journalism (Panama papers) - Social media and network graphs - Analytics - … and so on!

Slide 6

Slide 6 text

Use-case: Panama papers https://neo4j.com/blog/analyzing-panama-papers-neo4j/

Slide 7

Slide 7 text

Neo4j - Most popular graph DBMS and market leader - Property graph database - Graph storage and processing engine - Open source and great community-support - Visualization tool, browser and integration with multiple languages (Python, Java, etc.)

Slide 8

Slide 8 text

Cypher - Declarative graph query language - Allows for expressive and efficient querying and updating of a property graph - SQL-ish

Slide 9

Slide 9 text

Cypher MATCH (a:Artist),(b:Album) WHERE a.Name = "Pink Floyd" AND b.Name = "Dark side of the moon" CREATE (a)-[r:RELEASED]->(b) RETURN r

Slide 10

Slide 10 text

Py2Neo - Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. - Github repo: https://github.com/technige/py2neo

Slide 11

Slide 11 text

Py2Neo >>> from py2neo.data import Node, Relationship >>> a = Node("Person", name="Alice") >>> b = Node("Person", name="Bob") >>> ab = Relationship(a, "KNOWS", b) >>> ab (Alice)-[:KNOWS]->(Bob) -

Slide 12

Slide 12 text

Graph algorithms - Centralities (Pagerank, Betweenness, Closeness) - Community detection (Louvain, Label propogation) - Path finding (Shortest path - A*, Dijkstra) - Similarity (Jaccard, Cosine) Neo4j-supported graph algorithms: https://neo4j.com/docs/graph-algorithms/current/introducti on/#introduction-algorithms

Slide 13

Slide 13 text

Challenges - Scalability - Integration of data silos

Slide 14

Slide 14 text

Resources - https://neo4j.com/ - https://neo4j.com/docs/operations-manual/current/ installation/ - https://github.com/jbmusso/awesome-graph - https://tinkerpop.apache.org/ - https://www.analyticsvidhya.com/blog/2018/04/introdu ction-to-graph-theory-network-analysis-python-codes

Slide 15

Slide 15 text

Questions?

Slide 16

Slide 16 text

Thank you!