Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Relational is the new Big Data by Miguel Ángel ...

Relational is the new Big Data by Miguel Ángel Fajardo and Daniel Dominguez at Big Data Spain 2017

Relational databases were the persistence system of choice for decades, until the Web 2.0 in the 2000s required to process volumes of data so big it needed distributed systems running in parallel. A new type of databases (NoSQL) was adopted to solve this problem in different ways.

https://www.bigdataspain.org/2017/talk/relational-is-the-new-big-data

Big Data Spain 2017
16th - 17th November Kinépolis Madrid

Big Data Spain

November 22, 2017
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. Daniel Domínguez Head of Data Previously: CIEMAT / CERN @danieluchi01

    Miguel Ángel Fajardo CTO Previously: EA Games, Gilt, Shutterstock @ma_bits
  2. 1960s Information Management System (IMS) by IBM • Built for

    Saturn V moon rocket • Hierarchical, tree structure
  3. 1980s-90s Development of RDBMS • Widely adopted • Models easy

    to define • ACID transactions • Clients for all stacks • ORMs
  4. Problems scaling Relational DBs • Sharding is hard • Maintaining

    transactions ACID is hard • Two-phase commit is hard • Parallelizing is hard
  5. Key-value stores ◦ User session data ◦ Component configuration ◦

    Cached data, fast access ◦ Complex queries ◦ Interconnected data
  6. Column-oriented DB ◦ Real time analytics ◦ Facebook Messenger ◦

    Queries against few rows ◦ Flexible data schemas ◦ Incremental data loads/deletes
  7. ◦ Records with different fields ◦ Models with many layers

    ◦ Joins ◦ Flexible queries Document-oriented DB
  8. Graph DB ◦ Routing ◦ Social networks ◦ Disease spreading

    ◦ Hard to do aggregates ◦ Analytics
  9. No one magic database to rule them all • Each

    of them fits a small number of use cases • Often hard, complex and expensive to maintain • Specific query languages CQL
  10. 2010s Relational strikes back • Less structured data formats •

    Partitioning • Parallel execution • Sharding • C, A and P?
  11. • ACID for queries going to a single shard •

    Open Source, DAAS • PostgreSQL extension • Interactive analytics • Multi-tenant • Fully ACID • Open Source • PostgreSQL fork • Scaling intensive • Multi-tenant