Slide 1

Slide 1 text

Traversing the land of graph computing and databases Akash Tandon Data engineer and aficionado PyCon Italy (@pyconit)

Slide 2

Slide 2 text

About me

Slide 3

Slide 3 text

Why are you here? - Understand graphs as an elegant representation of data - Graph theory - standing on the shoulders of giants - Recent rise of graph tech

Slide 4

Slide 4 text

What exactly are graph and graph tech?

Slide 5

Slide 5 text

Konigsberg bridge problem

Slide 6

Slide 6 text

Comparison with tabular format

Slide 7

Slide 7 text

Working with GTFS data

Slide 8

Slide 8 text

MATCH (person)-[:BORN_IN]->()-[:WITHIN*0..]->(us:Location {name:'United States'}), (person)-[:LIVES_IN]->()-[:WITHIN*0..]->(eu:Location {name:'Europe'}) RETURN person.name WITH RECURSIVE -- in_usa is the set of vertex IDs of all locations within the United States in_usa(vertex_id) AS (SELECT vertex_id FROM vertices WHERE properties->>'name' = 'United States' UNION SELECT edges.tail_vertex FROM edges JOIN in_usa ON edges.head_vertex = in_usa.vertex_id WHERE edges.label = 'within' ), -- in_europe is the set of vertex IDs of all locations within Europe in_europe(vertex_id) AS ( SELECT vertex_id FROM vertices WHERE properties->>'name' = 'Europe' UNION SELECT edges.tail_vertex FROM edges JOIN in_europe ON edges.head_vertex = in_europe.vertex_id WHERE edges.label = 'within' ), -- born_in_usa is the set of vertex IDs of all people born in the US born_in_usa(vertex_id) AS ( SELECT edges.tail_vertex FROM edges JOIN in_usa ON edges.head_vertex = in_usa.vertex_id WHERE edges.label = 'born_ina), - lives_in_europe is the set of vertex IDs of all people living in Europe lives_in_europe(vertex_id) AS ( SELECT edges.tail_vertex FROM edges JOIN in_europe ON edges.head_vertex = in_europe.vertex_idWHERE edges.label ='lives_in') SELECT vertices.properties->>'name' FROM vertices -- join to find those people who were both born in the US *and* live in Europe JOIN born_in_usa ON vertices.vertex_id = born_in_usa.vertex_id JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id; Cypher versus SQL (Martin Kleppmann, Designing Data Intensive Applications, 2017)

Slide 9

Slide 9 text

Use-cases across domains - Knowledge graphs - Recommendation engines - Social networks - Privacy and compliance - Data integration and master data management Source: Graph database use-cases

Slide 10

Slide 10 text

Data democratization at AirBnB

Slide 11

Slide 11 text

Rise of graph tech

Slide 12

Slide 12 text

Semantic Web Graph tech ecosystem Managed (cloud) services

Slide 13

Slide 13 text

When to use graphs? - Relationships are primary citizens. - Many-to-many relationships exist.

Slide 14

Slide 14 text

When not to use graphs? Depends on the use-case but some situations can include: - Disconnected data; relationships don’t matter - Data model is consistent and fixed - Bulk scans instead of starting from a point

Slide 15

Slide 15 text

Challenges - Preprocessing pipelines and incremental updates - Developing ecosystem and disparity in available options

Slide 16

Slide 16 text

Demo

Slide 17

Slide 17 text

Resources - Neo4j - Py2neo tutorial - Apache Tinkerpop - Networkx - Awesome-graph list (Github) - WTF is a knowledge graph?

Slide 18

Slide 18 text

CONTACT ME @analyticalmonk @AkashTandon [email protected]