Go is a Graph

Go is a Graph @francesc

Francesc Campoy VP of Product at Dgraph Labs @francesc campoy
You might know me from: justforfunc.com Google Cloud Platform Podcast About me

Agenda - A Graph - A Graph Database in Go
- Graphs in Go - Graphs in Go in a Graph Database in Go - …

We are hiring! dgraph.io/careers

Graphs

Steven Spielberg Jaws Jurassic Park genre directed Comedy Thriller Science
Fiction directed genre genre genre A movie graph node relationship Legend

name: Steven Spielberg name: Jaws year: 1975 name: Jurassic Park
year: 1993 genre directed name: Comedy name: Thriller name: Science Fiction directed genre genre genre A movie graph with properties node relationship Legend

Graphs in Databases

name: Steven Spielberg name: Jaws year: 1975 name: Jurassic Park
year: 1993 genre directed name: Comedy name: Thriller name: Science Fiction directed genre genre genre How would you store this in your database?

The mapping process can be complex: - one-to-one relationships become
foreign keys - one-to-many relationships become foreign keys (repeated foreign keys if reversed) - many-to-many become rows in a new table with multiple foreign keys Traversals require joins which become very expensive quick. A graph in a relational database

ID int Name string The “Movie - Director” on a
relational DB Movie ID int Name string Year int … DirectorID int FK Director Note: Fetching all the movies directed by a director requires an index for performance.

Twilight Zone: The Movie Directed by: - Steven Spielberg -
John Landis - Joe Dante - George Miller Good luck migrating the schema! You thought a movie had one director?

What is a MovieDirector? We had to modify our logical
model to ﬁt the technology, bringing in unnecessary complexity. MovieID int FK DirectorID int FK ID int Name string Year int … ID int Name string The “Movie - Director” on a relational DB Movie Director MovieDirector

No foreign keys - You will need to keep many
copies of your information. - You will need to keep them all up to date. - At that point, why do you even have a database? Traversals require multiple queries: get element, ﬁnd property, get next element, etc. A graph in a non-relational database

Fetching all the names of the movies directed by a
director requires n+1 queries. The “Movie - Director” on a no-SQL DB (A) Director document { “_id”: 111, “name”: “Steven Spielberg”, “movies”: [ 123, 234, 345 ] } Movie document { “_id”: 123, “name”: “Jaws”, “year”: ... } Movie document { “_id”: 234, “name”: “E.T.”, “year”: ... } Movie document { “_id”: 345, “name”: “Jurassic Park”, “year”: ... }

So … what if a movie has multiple directors? The
“Movie - Director” on a no-SQL DB (B) Director document { “_id”: 111, “name”: “Steven Spielberg”, “movies”: [ { “_id”: 123, “name”: “Jaws”, “year”: … }, { “_id”: 234, “name”: “E.T.”, “year”: … }, { “_id”: 345, “name”: “Jurassic Park”, “year”: … } ] }

Director document { “_id”: 111, “name”: “Steven Spielberg”, “movies”: [
{ “_id”: 123, “name”: “Jaws”, “year”: … }, { “_id”: 234, “name”: “E.T.”, “year”: … }, { “_id”: 345, “name”: “Jurassic Park”, “year”: … } ] } The “Movie - Director” on a no-SQL DB (C) Movie document { “_id”: 123, “name”: “Jaws”, “year”: ... } Movie document { “_id”: 234, “name”: “E.T.”, “year”: ... } Movie document { “_id”: 345, “name”: “Jurassic Park”, “year”: ... } Now we can fetch all movies for a director in a query … but we might easily lose consistency.

Graph Databases

No need for mapping, the “whiteboard” model is your model.
No need for “joins”: - traversals are fast since relationships point directly to nodes, not keys - “Index-free adjacency” - deep traversals are possible (and efﬁcient) Graph Databases

Dgraph Data Modeling

Subject-Predicate-Value: Subject Predicate Value Jaws <was recorded in the year>
1975 Subject-Predicate-Object: Subject Predicate Object Jaws <was directed by> Steven Spielberg Subject-Predicate-Value name: Steven Spielberg name: Jaws was directed by name: Jaws year: 1975

The previous slide is not 100% accurate, as nodes have
their own identifiers. So instead of using strings as identifiers: “Jaws” <was recorded in the year> 1975 “Jaws” <was directed by> “Steven Spielberg” We have Universal Identifiers (UIDs): 0x1 <has name> “Jaws” 0x1 <was recorded in the year> 1975 0x1 <was directed by> 0x2 0x2 <has name> “Steven Spielberg” Dgraph data modeling

Given the data from before: 0x1 <has name> “Jaws” 0x1
<was recorded in the year> 1975 0x1 <was directed by> 0x2 0x2 <has name> “Steven Spielberg” - 0x1 and 0x2 are UIDs (Universal IDentiﬁers). - <has name>, <was recorded in the year>, etc. are predicates. - “Jaws”, “Steven Spielberg”, and 1975 are values. Dgraph data modeling

Predicates are always attached to UIDs. We associate values and
objects to keys composed by UID + predicate Keys Values 0x1:<has name> “Jaws” 0x1:<was recorded in the year> 1975 0x1:<was directed by> 0x2 0x2:<has name> “Steven Spielberg” Sometimes a value can be an array of UIDs or values. Dgraph data modeling

1. Find the starting nodes for the traversal. 2. Append
the predicate name to the UIDs we have so far. 3. Find the values associated to the UID:Predicate pairs. 4. Repeat (2) until the associated values are non UID or query is done. Beneﬁt: values are not involved, keeping memory requirements low. Life of a query

Life of a query Example: give me the name of
the friends of 0x1234. 0x1234 <is_friends_with> _ <has_name> X 1. Find the node with UID 0x1234 2. Append <is_friends_with> 0x1234 3. Retrieve values from 0x1234:<is_friends_with> [0xABCD, 0xBCDE] a. 0xABCD <has_name> “Diggy” | 0xBCDE <has_name> “Augie” 4. Return [“Diggy”, “Augie”]

How do we find the first nodes? We don’t always
have the UID of the first node of our traversal. We can find them by the value of one of its predicates! - Node with name “Augie”. - All nodes with a predicate <has_age> larger than 18. - All nodes with a predicate <location> 20mi around SLC. These searches could be very expensive, so we use indices.

Indexing in Dgraph Dgraph provides indices on: - Strings: hash,
exact, term, fulltext, trigram. - DateTime: year, month, day, hour. - Int, Float, Bool: default value index. - Geo properties : default value index.

- Schemas are not required in general. - But indices
can only be deﬁned on schema ﬁelds. - Be aware of the space requirements of the indices. Example, and indexed name predicate: <name>: string @index(fulltext, hash, term, trigram) . Dgraph schemas and indices

1. Find the starting nodes for the traversal using UIDs
or indexes. 2. Append the predicate ID (int) to the UIDs we have so far. 3. Find the values associated to the UID:Predicate pairs. 4. Repeat (2) until the associated values are non UID or query is done. Beneﬁt: values are not involved, keeping memory requirements low. Updated life of a query

So … how is this related to Go?

What is Go - A programming language. - A community.
but really ... - A bunch of zeros and ones!

What is Go? '112', '97', '99', '107', '97', '103', '101',
'32', '109', '97', '105', '110', '10', '10', '105', '109', '112', '111', '114', '116', '32', '40', '10', '9', '34', '102', '109', '116', '34', '10', '41', '10', '10', '102', '117', '110', '99', '32', '109', '97', '105', '110', '40', '41', '32', '123', '10', '9', '102', '109', '116', '46', '80', '114', '105', '110', '116', '108', '110', '40', '34', '72', '101', '108', '108', '111', '44', '32', '112', '108', '97', '121', '103', '114', '111', '117', '110', '100', '34', '41', '10', '125', '10' package main import “fmt” func main() { fmt.Println(“Hello, Gophers”) }

package package IDENT main ; import import STRING "fmt" ;
func func IDENT main ( ) What is Go? { IDENT fmt . IDENT Println ( STRING "Hello, Denver" ) ; } ; package main import “fmt” func main() { fmt.Println(“Hello, Gophers”) }

What is Go? package main import “fmt” func main() {
fmt.Println(“Hello, Gophers”) }

I HEARD YOU LIKE GO SO WE WROTE SOME GO
PROGRAMS TO PUT SOME GO GRAPHS IN A GRAPH DATABASE WRITTEN IN GO

The import graph Package a imports b, b imports c
... Main pieces: - golang.org/x/tools/go/packages - go list - github.com/dgraph-io/dgo

Schema <id>: string @index(exact) . <imports>: [uid] @reverse . Sample
_:a <id> “main” . _:b <id> “fmt” . _:a <imports> _:b . The import graph

Demo time!

The import graph

Control-Flow Graph package main import “fmt” func main() { if
greeting() { fmt.Print(“Hello”) } else { fmt.Print(“Bye”) } fmt.Println(“, Gophers”) }

Control-Flow Graph After running x, you might run y. Main
pieces: - github.com/golang/tools/go/cfg - github.com/dgraph-io/dgo

Schema <block>: string @index(term) . <node>: [uid] . <succ>: [uid]
@reverse . <body>: string @index(term) . Sample _:a <block> “block 0 (start)” . _:a <node> _:a1 (number=0) . _:a1 <body> “fmt.Println(“hello”)” . _:a <succ> _:b . Control-ﬂow graph

Demo time!

Control-Flow Graph

Some ideas - A Dgraph powered godoc instance? - Merging
all of the graphs! - Types graphs - Graphs as the source for code analysis

- bit.ly/go-is-a-graph - github.com/campoy/code-as-graphs - @francesc - [email protected] (we’re hiring)
Thanks!

Go is a Graph

Go is a Graph

More Decks by Francesc Campoy Flores

Other Decks in Technology

Featured

Transcript