$30 off During Our Annual Pro Sale. View Details »

Go is a Graph

Go is a Graph

In this talk, Francesc will explore how source code - and specifically Go source code - can be understood as a graph formed by a syntax tree and other relationships like "defined in" or "calls" etc.

Then we will explore how that data can be stored in Dgraph (a graph DB written purely in Go) with a program written in Go to analyze some Go code (#meta) and what kind of questions we can answer then.

Francesc Campoy Flores

November 08, 2019
Tweet

More Decks by Francesc Campoy Flores

Other Decks in Technology

Transcript

  1. Go is a Graph
    @francesc

    View Slide

  2. Francesc Campoy
    VP of Product at Dgraph Labs
    @francesc
    campoy
    You might know me from:
    justforfunc.com
    Google Cloud Platform Podcast
    About me

    View Slide

  3. Agenda
    - A Graph
    - A Graph Database in Go
    - Graphs in Go
    - Graphs in Go in a Graph Database in Go
    - …

    View Slide

  4. We
    are
    hiring!
    dgraph.io/careers

    View Slide

  5. Graphs

    View Slide

  6. Steven
    Spielberg
    Jaws
    Jurassic
    Park
    genre
    directed
    Comedy Thriller
    Science
    Fiction
    directed
    genre
    genre genre
    A movie graph node
    relationship
    Legend

    View Slide

  7. name: Steven
    Spielberg
    name: Jaws
    year: 1975
    name:
    Jurassic Park
    year: 1993
    genre
    directed
    name:
    Comedy
    name: Thriller
    name:
    Science
    Fiction
    directed
    genre
    genre genre
    A movie graph with properties node
    relationship
    Legend

    View Slide

  8. Graphs in Databases

    View Slide

  9. name: Steven
    Spielberg
    name: Jaws
    year: 1975
    name:
    Jurassic Park
    year: 1993
    genre
    directed
    name:
    Comedy
    name: Thriller
    name:
    Science
    Fiction
    directed
    genre
    genre genre
    How would you store this in your database?

    View Slide

  10. The mapping process can be complex:
    - one-to-one relationships become foreign keys
    - one-to-many relationships become foreign keys (repeated foreign keys if
    reversed)
    - many-to-many become rows in a new table with multiple foreign keys
    Traversals require joins which become very expensive quick.
    A graph in a relational database

    View Slide

  11. ID int
    Name string
    The “Movie - Director” on a relational DB
    Movie
    ID int
    Name string
    Year int

    DirectorID int FK
    Director
    Note: Fetching all the movies directed by a director requires an index for
    performance.

    View Slide

  12. Twilight Zone: The Movie
    Directed by:
    - Steven Spielberg
    - John Landis
    - Joe Dante
    - George Miller
    Good luck migrating the schema!
    You thought a movie had one director?

    View Slide

  13. What is a MovieDirector? We had to modify our logical model to fit the
    technology, bringing in unnecessary complexity.
    MovieID int FK
    DirectorID int FK
    ID int
    Name string
    Year int

    ID int
    Name string
    The “Movie - Director” on a relational DB
    Movie Director
    MovieDirector

    View Slide

  14. No foreign keys
    - You will need to keep many copies of your information.
    - You will need to keep them all up to date.
    - At that point, why do you even have a database?
    Traversals require multiple queries: get element, find property, get next element,
    etc.
    A graph in a non-relational database

    View Slide

  15. Fetching all the names of the movies directed by a director requires n+1
    queries.
    The “Movie - Director” on a no-SQL DB (A)
    Director document
    {
    “_id”: 111,
    “name”: “Steven Spielberg”,
    “movies”: [
    123,
    234,
    345
    ]
    }
    Movie document
    {
    “_id”: 123,
    “name”: “Jaws”, “year”: ...
    }
    Movie document
    {
    “_id”: 234,
    “name”: “E.T.”, “year”: ...
    }
    Movie document
    {
    “_id”: 345,
    “name”: “Jurassic Park”, “year”: ...
    }

    View Slide

  16. So … what if a movie has multiple directors?
    The “Movie - Director” on a no-SQL DB (B)
    Director document
    {
    “_id”: 111,
    “name”: “Steven Spielberg”,
    “movies”: [
    {
    “_id”: 123, “name”: “Jaws”, “year”: …
    },
    {
    “_id”: 234, “name”: “E.T.”, “year”: …
    },
    {
    “_id”: 345, “name”: “Jurassic Park”, “year”: …
    }
    ]
    }

    View Slide

  17. Director document
    {
    “_id”: 111,
    “name”: “Steven Spielberg”,
    “movies”: [
    {
    “_id”: 123, “name”: “Jaws”, “year”: …
    },
    {
    “_id”: 234, “name”: “E.T.”, “year”: …
    },
    {
    “_id”: 345, “name”: “Jurassic Park”, “year”: …
    }
    ]
    }
    The “Movie - Director” on a no-SQL DB (C)
    Movie document
    {
    “_id”: 123,
    “name”: “Jaws”, “year”: ...
    }
    Movie document
    {
    “_id”: 234,
    “name”: “E.T.”, “year”: ...
    }
    Movie document
    {
    “_id”: 345,
    “name”: “Jurassic Park”, “year”: ...
    }
    Now we can fetch all movies for a director in a query … but we might
    easily lose consistency.

    View Slide

  18. Graph Databases

    View Slide

  19. No need for mapping, the “whiteboard” model is your model.
    No need for “joins”:
    - traversals are fast since relationships point directly to nodes, not keys
    - “Index-free adjacency”
    - deep traversals are possible (and efficient)
    Graph Databases

    View Slide

  20. Dgraph Data Modeling

    View Slide

  21. Subject-Predicate-Value:
    Subject Predicate Value
    Jaws 1975
    Subject-Predicate-Object:
    Subject Predicate Object
    Jaws Steven Spielberg
    Subject-Predicate-Value
    name:
    Steven
    Spielberg
    name:
    Jaws
    was directed by
    name: Jaws
    year: 1975

    View Slide

  22. The previous slide is not 100% accurate, as nodes have their own
    identifiers.
    So instead of using strings as identifiers:
    “Jaws” 1975
    “Jaws” “Steven Spielberg”
    We have Universal Identifiers (UIDs):
    0x1 “Jaws”
    0x1 1975
    0x1 0x2
    0x2 “Steven Spielberg”
    Dgraph data modeling

    View Slide

  23. Given the data from before:
    0x1 “Jaws”
    0x1 1975
    0x1 0x2
    0x2 “Steven Spielberg”
    - 0x1 and 0x2 are UIDs (Universal IDentifiers).
    - , , etc. are predicates.
    - “Jaws”, “Steven Spielberg”, and 1975 are values.
    Dgraph data modeling

    View Slide

  24. Predicates are always attached to UIDs.
    We associate values and objects to keys composed by UID + predicate
    Keys Values
    0x1: “Jaws”
    0x1: 1975
    0x1: 0x2
    0x2: “Steven Spielberg”
    Sometimes a value can be an array of UIDs or values.
    Dgraph data modeling

    View Slide

  25. 1. Find the starting nodes for the traversal.
    2. Append the predicate name to the UIDs we have so far.
    3. Find the values associated to the UID:Predicate pairs.
    4. Repeat (2) until the associated values are non UID or query is done.
    Benefit: values are not involved, keeping memory requirements low.
    Life of a query

    View Slide

  26. Life of a query
    Example: give me the name of the friends of 0x1234.
    0x1234 _ X
    1. Find the node with UID 0x1234
    2. Append 0x1234
    3. Retrieve values from 0x1234: [0xABCD, 0xBCDE]
    a. 0xABCD “Diggy” | 0xBCDE “Augie”
    4. Return [“Diggy”, “Augie”]

    View Slide

  27. How do we find the first nodes?
    We don’t always have the UID of the first node of our traversal.
    We can find them by the value of one of its predicates!
    - Node with name “Augie”.
    - All nodes with a predicate larger than 18.
    - All nodes with a predicate 20mi around SLC.
    These searches could be very expensive, so we use indices.

    View Slide

  28. Indexing in Dgraph
    Dgraph provides indices on:
    - Strings: hash, exact, term, fulltext, trigram.
    - DateTime: year, month, day, hour.
    - Int, Float, Bool: default value index.
    - Geo properties : default value index.

    View Slide

  29. - Schemas are not required in general.
    - But indices can only be defined on schema fields.
    - Be aware of the space requirements of the indices.
    Example, and indexed name predicate:
    : string @index(fulltext, hash, term, trigram) .
    Dgraph schemas and indices

    View Slide

  30. 1. Find the starting nodes for the traversal using UIDs or indexes.
    2. Append the predicate ID (int) to the UIDs we have so far.
    3. Find the values associated to the UID:Predicate pairs.
    4. Repeat (2) until the associated values are non UID or query is done.
    Benefit: values are not involved, keeping memory requirements low.
    Updated life of a query

    View Slide

  31. So … how is this related to Go?

    View Slide

  32. What is Go
    - A programming language.
    - A community.
    but really ...
    - A bunch of zeros and ones!

    View Slide

  33. What is Go?
    '112', '97', '99', '107', '97', '103',
    '101', '32', '109', '97', '105', '110',
    '10', '10', '105', '109', '112', '111',
    '114', '116', '32', '40', '10', '9',
    '34', '102', '109', '116', '34', '10',
    '41', '10', '10', '102', '117', '110',
    '99', '32', '109', '97', '105', '110',
    '40', '41', '32', '123', '10', '9',
    '102', '109', '116', '46', '80', '114',
    '105', '110', '116', '108', '110', '40',
    '34', '72', '101', '108', '108', '111',
    '44', '32', '112', '108', '97', '121',
    '103', '114', '111', '117', '110', '100',
    '34', '41', '10', '125', '10'
    package main
    import “fmt”
    func main() {
    fmt.Println(“Hello, Gophers”)
    }

    View Slide

  34. package package
    IDENT main
    ;
    import import
    STRING "fmt"
    ;
    func func
    IDENT main
    (
    )
    What is Go?
    {
    IDENT fmt
    .
    IDENT Println
    (
    STRING "Hello, Denver"
    )
    ;
    }
    ;
    package main
    import “fmt”
    func main() {
    fmt.Println(“Hello, Gophers”)
    }

    View Slide

  35. What is Go?
    package main
    import “fmt”
    func main() {
    fmt.Println(“Hello, Gophers”)
    }

    View Slide

  36. I HEARD YOU LIKE GO
    SO WE WROTE SOME GO PROGRAMS
    TO PUT SOME GO GRAPHS
    IN A GRAPH DATABASE WRITTEN IN GO

    View Slide

  37. The import graph
    Package a imports b, b imports c ...
    Main pieces:
    - golang.org/x/tools/go/packages
    - go list
    - github.com/dgraph-io/dgo

    View Slide

  38. Schema
    : string @index(exact) .
    : [uid] @reverse .
    Sample
    _:a “main” .
    _:b “fmt” .
    _:a _:b .
    The import graph

    View Slide

  39. Demo time!

    View Slide

  40. The import graph

    View Slide

  41. Control-Flow Graph
    package main
    import “fmt”
    func main() {
    if greeting() {
    fmt.Print(“Hello”)
    } else {
    fmt.Print(“Bye”)
    }
    fmt.Println(“, Gophers”)
    }

    View Slide

  42. Control-Flow Graph
    After running x, you might run y.
    Main pieces:
    - github.com/golang/tools/go/cfg
    - github.com/dgraph-io/dgo

    View Slide

  43. Schema
    : string @index(term) .
    : [uid] .
    : [uid] @reverse .
    : string @index(term) .
    Sample
    _:a “block 0 (start)” .
    _:a _:a1 (number=0) .
    _:a1 “fmt.Println(“hello”)” .
    _:a _:b .
    Control-flow graph

    View Slide

  44. Demo time!

    View Slide

  45. Control-Flow Graph

    View Slide

  46. Some ideas
    - A Dgraph powered godoc instance?
    - Merging all of the graphs!
    - Types graphs
    - Graphs as the source for code analysis

    View Slide

  47. - bit.ly/go-is-a-graph
    - github.com/campoy/code-as-graphs
    - @francesc
    - [email protected] (we’re hiring)
    Thanks!

    View Slide