Go is a Graph - Speaker Deck

Slide 1

Slide 1 text

Go is a Graph @francesc

Slide 2

Slide 2 text

Francesc Campoy VP of Product at Dgraph Labs @francesc campoy You might know me from: justforfunc.com Google Cloud Platform Podcast About me

Slide 3

Slide 3 text

Agenda - A Graph - A Graph Database in Go - Graphs in Go - Graphs in Go in a Graph Database in Go - …

Slide 4

Slide 4 text

We are hiring! dgraph.io/careers

Slide 5

Slide 5 text

Graphs

Slide 6

Slide 6 text

Steven Spielberg Jaws Jurassic Park genre directed Comedy Thriller Science Fiction directed genre genre genre A movie graph node relationship Legend

Slide 7

Slide 7 text

name: Steven Spielberg name: Jaws year: 1975 name: Jurassic Park year: 1993 genre directed name: Comedy name: Thriller name: Science Fiction directed genre genre genre A movie graph with properties node relationship Legend

Slide 8

Slide 8 text

Graphs in Databases

Slide 9

Slide 9 text

name: Steven Spielberg name: Jaws year: 1975 name: Jurassic Park year: 1993 genre directed name: Comedy name: Thriller name: Science Fiction directed genre genre genre How would you store this in your database?

Slide 10

Slide 10 text

The mapping process can be complex: - one-to-one relationships become foreign keys - one-to-many relationships become foreign keys (repeated foreign keys if reversed) - many-to-many become rows in a new table with multiple foreign keys Traversals require joins which become very expensive quick. A graph in a relational database

Slide 11

Slide 11 text

ID int Name string The “Movie - Director” on a relational DB Movie ID int Name string Year int … DirectorID int FK Director Note: Fetching all the movies directed by a director requires an index for performance.

Slide 12

Slide 12 text

Twilight Zone: The Movie Directed by: - Steven Spielberg - John Landis - Joe Dante - George Miller Good luck migrating the schema! You thought a movie had one director?

Slide 13

Slide 13 text

What is a MovieDirector? We had to modify our logical model to ﬁt the technology, bringing in unnecessary complexity. MovieID int FK DirectorID int FK ID int Name string Year int … ID int Name string The “Movie - Director” on a relational DB Movie Director MovieDirector

Slide 14

Slide 14 text

No foreign keys - You will need to keep many copies of your information. - You will need to keep them all up to date. - At that point, why do you even have a database? Traversals require multiple queries: get element, ﬁnd property, get next element, etc. A graph in a non-relational database

Slide 15

Slide 15 text

Fetching all the names of the movies directed by a director requires n+1 queries. The “Movie - Director” on a no-SQL DB (A) Director document { “_id”: 111, “name”: “Steven Spielberg”, “movies”: [ 123, 234, 345 ] } Movie document { “_id”: 123, “name”: “Jaws”, “year”: ... } Movie document { “_id”: 234, “name”: “E.T.”, “year”: ... } Movie document { “_id”: 345, “name”: “Jurassic Park”, “year”: ... }

Slide 16

Slide 16 text

So … what if a movie has multiple directors? The “Movie - Director” on a no-SQL DB (B) Director document { “_id”: 111, “name”: “Steven Spielberg”, “movies”: [ { “_id”: 123, “name”: “Jaws”, “year”: … }, { “_id”: 234, “name”: “E.T.”, “year”: … }, { “_id”: 345, “name”: “Jurassic Park”, “year”: … } ] }

Slide 17

Slide 17 text

Director document { “_id”: 111, “name”: “Steven Spielberg”, “movies”: [ { “_id”: 123, “name”: “Jaws”, “year”: … }, { “_id”: 234, “name”: “E.T.”, “year”: … }, { “_id”: 345, “name”: “Jurassic Park”, “year”: … } ] } The “Movie - Director” on a no-SQL DB (C) Movie document { “_id”: 123, “name”: “Jaws”, “year”: ... } Movie document { “_id”: 234, “name”: “E.T.”, “year”: ... } Movie document { “_id”: 345, “name”: “Jurassic Park”, “year”: ... } Now we can fetch all movies for a director in a query … but we might easily lose consistency.

Slide 18

Slide 18 text

Graph Databases

Slide 19

Slide 19 text

No need for mapping, the “whiteboard” model is your model. No need for “joins”: - traversals are fast since relationships point directly to nodes, not keys - “Index-free adjacency” - deep traversals are possible (and efﬁcient) Graph Databases

Slide 20

Slide 20 text

Dgraph Data Modeling

Slide 21

Slide 21 text

Subject-Predicate-Value: Subject Predicate Value Jaws 1975 Subject-Predicate-Object: Subject Predicate Object Jaws Steven Spielberg Subject-Predicate-Value name: Steven Spielberg name: Jaws was directed by name: Jaws year: 1975

Slide 22

Slide 22 text

The previous slide is not 100% accurate, as nodes have their own identifiers. So instead of using strings as identifiers: “Jaws” 1975 “Jaws” “Steven Spielberg” We have Universal Identifiers (UIDs): 0x1 “Jaws” 0x1 1975 0x1 0x2 0x2 “Steven Spielberg” Dgraph data modeling

Slide 23

Slide 23 text

Given the data from before: 0x1 “Jaws” 0x1 1975 0x1 0x2 0x2 “Steven Spielberg” - 0x1 and 0x2 are UIDs (Universal IDentiﬁers). - , , etc. are predicates. - “Jaws”, “Steven Spielberg”, and 1975 are values. Dgraph data modeling

Slide 24

Slide 24 text

Predicates are always attached to UIDs. We associate values and objects to keys composed by UID + predicate Keys Values 0x1: “Jaws” 0x1: 1975 0x1: 0x2 0x2: “Steven Spielberg” Sometimes a value can be an array of UIDs or values. Dgraph data modeling

Slide 25

Slide 25 text

1. Find the starting nodes for the traversal. 2. Append the predicate name to the UIDs we have so far. 3. Find the values associated to the UID:Predicate pairs. 4. Repeat (2) until the associated values are non UID or query is done. Beneﬁt: values are not involved, keeping memory requirements low. Life of a query

Slide 26

Slide 26 text

Life of a query Example: give me the name of the friends of 0x1234. 0x1234 _ X 1. Find the node with UID 0x1234 2. Append 0x1234 3. Retrieve values from 0x1234: [0xABCD, 0xBCDE] a. 0xABCD “Diggy” | 0xBCDE “Augie” 4. Return [“Diggy”, “Augie”]

Slide 27

Slide 27 text

How do we find the first nodes? We don’t always have the UID of the first node of our traversal. We can find them by the value of one of its predicates! - Node with name “Augie”. - All nodes with a predicate larger than 18. - All nodes with a predicate 20mi around SLC. These searches could be very expensive, so we use indices.

Slide 28

Slide 28 text

Indexing in Dgraph Dgraph provides indices on: - Strings: hash, exact, term, fulltext, trigram. - DateTime: year, month, day, hour. - Int, Float, Bool: default value index. - Geo properties : default value index.

Slide 29

Slide 29 text

- Schemas are not required in general. - But indices can only be deﬁned on schema ﬁelds. - Be aware of the space requirements of the indices. Example, and indexed name predicate: : string @index(fulltext, hash, term, trigram) . Dgraph schemas and indices

Slide 30

Slide 30 text

1. Find the starting nodes for the traversal using UIDs or indexes. 2. Append the predicate ID (int) to the UIDs we have so far. 3. Find the values associated to the UID:Predicate pairs. 4. Repeat (2) until the associated values are non UID or query is done. Beneﬁt: values are not involved, keeping memory requirements low. Updated life of a query

Slide 31

Slide 31 text

So … how is this related to Go?

Slide 32

Slide 32 text

What is Go - A programming language. - A community. but really ... - A bunch of zeros and ones!

Slide 33

Slide 33 text

What is Go? '112', '97', '99', '107', '97', '103', '101', '32', '109', '97', '105', '110', '10', '10', '105', '109', '112', '111', '114', '116', '32', '40', '10', '9', '34', '102', '109', '116', '34', '10', '41', '10', '10', '102', '117', '110', '99', '32', '109', '97', '105', '110', '40', '41', '32', '123', '10', '9', '102', '109', '116', '46', '80', '114', '105', '110', '116', '108', '110', '40', '34', '72', '101', '108', '108', '111', '44', '32', '112', '108', '97', '121', '103', '114', '111', '117', '110', '100', '34', '41', '10', '125', '10' package main import “fmt” func main() { fmt.Println(“Hello, Gophers”) }

Slide 34

Slide 34 text

package package IDENT main ; import import STRING "fmt" ; func func IDENT main ( ) What is Go? { IDENT fmt . IDENT Println ( STRING "Hello, Denver" ) ; } ; package main import “fmt” func main() { fmt.Println(“Hello, Gophers”) }

Slide 35

Slide 35 text

What is Go? package main import “fmt” func main() { fmt.Println(“Hello, Gophers”) }

Slide 36

Slide 36 text

I HEARD YOU LIKE GO SO WE WROTE SOME GO PROGRAMS TO PUT SOME GO GRAPHS IN A GRAPH DATABASE WRITTEN IN GO

Slide 37

Slide 37 text

The import graph Package a imports b, b imports c ... Main pieces: - golang.org/x/tools/go/packages - go list - github.com/dgraph-io/dgo

Slide 38

Slide 38 text

Schema : string @index(exact) . : [uid] @reverse . Sample _:a “main” . _:b “fmt” . _:a _:b . The import graph

Slide 39

Slide 39 text

Demo time!

Slide 40

Slide 40 text

The import graph

Slide 41

Slide 41 text

Control-Flow Graph package main import “fmt” func main() { if greeting() { fmt.Print(“Hello”) } else { fmt.Print(“Bye”) } fmt.Println(“, Gophers”) }

Slide 42

Slide 42 text

Control-Flow Graph After running x, you might run y. Main pieces: - github.com/golang/tools/go/cfg - github.com/dgraph-io/dgo

Slide 43

Slide 43 text

Schema : string @index(term) . : [uid] . : [uid] @reverse . : string @index(term) . Sample _:a “block 0 (start)” . _:a _:a1 (number=0) . _:a1 “fmt.Println(“hello”)” . _:a _:b . Control-ﬂow graph

Slide 44

Slide 44 text

Demo time!

Slide 45

Slide 45 text

Control-Flow Graph

Slide 46

Slide 46 text

Some ideas - A Dgraph powered godoc instance? - Merging all of the graphs! - Types graphs - Graphs as the source for code analysis

Slide 47

Slide 47 text

- bit.ly/go-is-a-graph - github.com/campoy/code-as-graphs - @francesc - [email protected] (we’re hiring) Thanks!