Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dgraph: the Graph Database written in Go

Dgraph: the Graph Database written in Go

In this talk, Francesc - VP of Product at Dgraph Labs - gives an introduction to Dgraph and Graph Databases in general. The talk includes a live demo and Q&A session covering many aspects of the query language and fun facts about Pokémon.

Dgraph is an open-source, fast, feature-rich and horizontally scalable graph database. It's designed from the ground up to run for web-scale, and achieve high throughput and low latency for arbitrarily complex queries.

Written entirely in Go, it embraces simplicity and robustness. It provides a very readable and powerful query language derived from GraphQL: https://docs.dgraph.io/query-language/.

For more, check out Dgraph website: https://dgraph.io.

Francesc Campoy Flores

September 07, 2019
Tweet

More Decks by Francesc Campoy Flores

Other Decks in Technology

Transcript

  1. Francesc Campoy VP of Product at Dgraph Labs @francesc Campoy

    You might know me from: justforfunc.com Google Cloud Platform Podcast About me
  2. - Graphs and databases - What’s in a Graph? -

    Graph Databases - Dgraph - Dgraph Architecture - The Dgraph Query Language - Live Demo! - Q&A Agenda
  3. Steven Spielberg Jaws Jurassic Park genre directed Comedy Thriller Science

    Fiction directed genre genre genre A movie graph node relationship Legend
  4. name: Steven Spielberg name: Jaws year: 1975 name: Jurassic Park

    year: 1993 genre directed name: Comedy name: Thriller name: Science Fiction directed genre genre genre A movie graph with properties node relationship Legend
  5. name: Steven Spielberg name: Jaws year: 1975 name: Jurassic Park

    year: 1993 genre directed name: Comedy name: Thriller name: Science Fiction directed genre genre genre How would you store this in your database?
  6. The mapping process can be complex: - one-to-one relationships become

    foreign keys - one-to-many relationships become foreign keys (repeated foreign keys if reversed) - many-to-many become rows in a new table with multiple foreign keys Traversals require joins which become very expensive quick. A graph in a relational database
  7. ID int Name string The “Movie - Director” on a

    relational DB Movie ID int Name string Year int … DirectorID int FK Director Note: Fetching all the movies directed by a director requires an index for performance.
  8. Twilight Zone: The Movie Directed by: - Steven Spielberg -

    John Landis - Joe Dante - George Miller Good luck migrating the schema! You thought a movie had one director?
  9. What is a MovieDirector? We had to modify our logical

    model to fit the technology, bringing in unnecessary complexity. MovieID int FK DirectorID int FK ID int Name string Year int … ID int Name string The “Movie - Director” on a relational DB Movie Director MovieDirector
  10. No foreign keys - You will need to keep many

    copies of your information. - You will need to keep them all up to date. - At that point, why do you even have a database? Traversals require multiple queries: get element, find property, get next element, etc. A graph in a non-relational database
  11. Fetching all the names of the movies directed by a

    director requires n+1 queries. The “Movie - Director” on a no-SQL DB (A) Director document { “_id”: 111, “name”: “Steven Spielberg”, “movies”: [ 123, 234, 345 ] } Movie document { “_id”: 123, “name”: “Jaws”, “year”: ... } Movie document { “_id”: 234, “name”: “E.T.”, “year”: ... } Movie document { “_id”: 345, “name”: “Jurassic Park”, “year”: ... }
  12. So … what if a movie has multiple directors? The

    “Movie - Director” on a no-SQL DB (B) Director document { “_id”: 111, “name”: “Steven Spielberg”, “movies”: [ { “_id”: 123, “name”: “Jaws”, “year”: … }, { “_id”: 234, “name”: “E.T.”, “year”: … }, { “_id”: 345, “name”: “Jurassic Park”, “year”: … } ] }
  13. Director document { “_id”: 111, “name”: “Steven Spielberg”, “movies”: [

    { “_id”: 123, “name”: “Jaws”, “year”: … }, { “_id”: 234, “name”: “E.T.”, “year”: … }, { “_id”: 345, “name”: “Jurassic Park”, “year”: … } ] } The “Movie - Director” on a no-SQL DB (C) Movie document { “_id”: 123, “name”: “Jaws”, “year”: ... } Movie document { “_id”: 234, “name”: “E.T.”, “year”: ... } Movie document { “_id”: 345, “name”: “Jurassic Park”, “year”: ... } Now we can fetch all movies for a director in a query … but we might easily lose consistency.
  14. No need for mapping, the “whiteboard” model is your model.

    No need for “joins”: - traversals are fast since relationships point directly to nodes, not keys - “Index-free adjacency” - deep traversals are possible (and efficient) Graph Databases
  15. Subject-Predicate-Value: Subject Predicate Value Jaws <was recorded in the year>

    1975 Subject-Predicate-Object: Subject Predicate Object Jaws <was directed by> Steven Spielberg Subject-Predicate-Value name: Steven Spielberg name: Jaws was directed by name: Jaws year: 1975
  16. The previous slide is not 100% accurate, as nodes have

    their own identifiers. So instead of using strings as identifiers: “Jaws” <was recorded in the year> 1975 “Jaws” <was directed by> “Steven Spielberg” We have Universal Identifiers (UIDs): 0x1 <has name> “Jaws” 0x1 <was recorded in the year> 1975 0x1 <was directed by> 0x2 0x2 <has name> “Steven Spielberg” Dgraph data modeling
  17. Given the data from before: 0x1 <has name> “Jaws” 0x1

    <was recorded in the year> 1975 0x1 <was directed by> 0x2 0x2 <has name> “Steven Spielberg” - 0x1 and 0x2 are UIDs (Universal IDentifiers). - <has name>, <was recorded in the year>, etc. are predicates. - “Jaws”, “Steven Spielberg”, and 1975 are values. Dgraph data modeling
  18. Predicates are always attached to UIDs. We associate values and

    objects to keys composed by UID + predicate Keys Values 0x1:<has name> “Jaws” 0x1:<was recorded in the year> 1975 0x1:<was directed by> 0x2 0x2:<has name> “Steven Spielberg” Sometimes a value can be an array of UIDs or values. Dgraph data modeling
  19. 1. Find the starting nodes for the traversal. 2. Append

    the predicate name to the UIDs we have so far. 3. Find the values associated to the UID:Predicate pairs. 4. Repeat (2) until the associated values are non UID or query is done. Benefit: values are not involved, keeping memory requirements low. Life of a query
  20. Life of a query Example: give me the name of

    the friends of 0x1234. 0x1234 <is_friends_with> _ <has_name> X 1. Find the node with UID 0x1234 2. Append <is_friends_with> 0x1234 3. Retrieve values from 0x1234:<is_friends_with> [0xABCD, 0xBCDE] a. 0xABCD <has_name> “Diggy” | 0xBCDE <has_name> “Augie” 4. Return [“Diggy”, “Augie”]
  21. How do we find the first nodes? We don’t always

    have the UID of the first node of our traversal. We can find them by the value of one of its predicates! - Node with name “Augie”. - All nodes with a predicate <has_age> larger than 18. - All nodes with a predicate <location> 20mi around SLC. These searches could be very expensive, so we use indices.
  22. Indexing in Dgraph Dgraph provides indices on: - Strings: hash,

    exact, term, fulltext, trigram. - DateTime: year, month, day, hour. - Int, Float, Bool: default value index. - Geo properties : default value index.
  23. - Schemas are not required in general. - But indices

    can only be defined on schema fields. - Be aware of the space requirements of the indices. Example, and indexed name predicate: <name>: string @index(fulltext, hash, term, trigram) . Dgraph schemas and indices
  24. 1. Find the starting nodes for the traversal using UIDs

    or indexes. 2. Append the predicate ID (int) to the UIDs we have so far. 3. Find the values associated to the UID:Predicate pairs. 4. Repeat (2) until the associated values are non UID or query is done. Benefit: values are not involved, keeping memory requirements low. Updated life of a query
  25. Dgraph Architecture at a glance Alpha Alpha Alpha Zero Zero

    Zero Ratel Zero: Cluster Management Alpha: Data Storage Ratel: Web UI
  26. Dgraph Zero • Called Zero, because it used to be

    group zero. • Handles cluster membership, transactions and ID generation. • Each Dgraph cluster must have at least one Zero. • Participates in a Raft group for HA.
  27. Dgraph Alpha • Called Alpha, because it runs group 1

    and higher. • Stores data, serves queries. • Each Dgraph cluster must have at least one Alpha. • Participates in a Raft group for HA.
  28. • Not part of the cluster, but useful for exploration.

    • Connects to Alphas to respond to queries. • Completely optional, but definitely useful. Dgraph Ratel
  29. Dgraph Architecture at a glance Alpha Alpha Alpha Zero Zero

    Zero Ratel Zero: Cluster Management Alpha: Data Storage Ratel: Web UI
  30. - Graph Databases make storing and retrieving graphs efficient. -

    Dgraph provides Subject-Predicate-(Object/Value) - Traversals are very efficient thanks to optimizing for low-latency disk seeks and minimizing network calls. - Dgraph architecture provides high availability and horizontal scalability. Conclusion
  31. - Heavily inspired by GraphQL. - Modified to serve as

    a better DB language. - Pure GraphQL native support is coming up soon™ - Try it on playground instance of Ratel: play.dgraph.io GraphQL+-
  32. Create a new UID and associate it to value “Alice”

    with predicate <name>. { set { _:alice <name> "Alice" . } } GraphQL+- mutations
  33. Output: { "data": { "code": "Success", "message": "Done", "uids": {

    "alice": "0x1" } }, "extensions": {...} } GraphQL+- mutations
  34. Create a new UID and associate it to value “Alice”

    with predicate <name>. { set { _:bob <name> "Bob" . _:bob <knows> <0x1> . } } GraphQL+- mutations
  35. Output: { "data": { "code": "Success", "message": "Done", "uids": {

    "bob": "0x2" } }, "extensions": {...} } GraphQL+- mutations
  36. Let’s fetch people’s names Response: { "data": { "q": [

    { "uid": "0x1", "name": "Alice" }, { "uid": "0x2", "name": "Bob" }] }, "extensions": {...} } 0x1 0x2 Alice Bob name name
  37. Response: {"data": {"q": [{ "uid": "0x2", "name": "Bob", "knows": [{

    "uid": "0x1", "name": "Alice" }] }]}, "extensions": {...} } Who does Bob know? 0x2 Bob name 0x1 Alice name knows
  38. Query: { q(func: eq(name, “Bob”)) { uid name knows {

    uid name } } } Who does “Bob” know?
  39. Response: Error Name: t Message: : Attribute name is not

    indexed. URL: http://localhost:8080/query Who does “Bob” know?
  40. Response: { "data": {"q": [ { "uid": "0x2", "name": "Bob",

    "knows": [{ "uid": "0x1", "name": "Alice" }] }] }, "extensions": {...} } Who does “Bob” know? 0x2 Bob name 0x1 Alice name knows Index on <name> - Alice → 0x1 - Bob → 0x2
  41. Types We heard the community and released the first features

    for our type system with Dgraph v1.1.0. - A node can be of zero or more types. - The relationship is stored in the dgraph.type predicate. - Types can be used to find nodes with the type function. - Types are also used by expand.
  42. A type Person First, we defined a type: type Person

    { name: string knows: [uid] } Then, we can use it in our queries: { q(func: type(Person)) { expand(_all_) } }
  43. Updating values with dgraph.type? We would like to say all

    the nodes with name are of type Person. How would you do it? a) You know all the UIDs already, then just send a mutation. b) You fetch the UIDs in a query, then send a mutation. c) You use … UPSERT!
  44. Upsert provides a simple way to query the database and

    use the resulting data in a mutation. upsert { query { people as var(func: has(name)) {} } mutation { set { uid(people) <dgraph.type> "Person" . } } } Query + Mutation = Upsert
  45. Much much more to see Query language: - Filter -

    Cascade - Normalize - Recurse Schema: - Indexes - Reversed edges ...