Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Relational is the new Big Data by Miguel Ángel Fajardo and Daniel Dominguez at Big Data Spain 2017

Relational is the new Big Data by Miguel Ángel Fajardo and Daniel Dominguez at Big Data Spain 2017

Relational databases were the persistence system of choice for decades, until the Web 2.0 in the 2000s required to process volumes of data so big it needed distributed systems running in parallel. A new type of databases (NoSQL) was adopted to solve this problem in different ways.

https://www.bigdataspain.org/2017/talk/relational-is-the-new-big-data

Big Data Spain 2017
16th - 17th November Kinépolis Madrid

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

November 22, 2017
Tweet

Transcript

  1. None
  2. RELATIONAL is the new BIG DATA

  3. Daniel Domínguez Head of Data Previously: CIEMAT / CERN @danieluchi01

    Miguel Ángel Fajardo CTO Previously: EA Games, Gilt, Shutterstock @ma_bits
  4. None
  5. Distributed Processing Not Only SQL

  6. None
  7. None
  8. None
  9. A long time ago in a galaxy far, far away...

  10. 1960s Information Management System (IMS) by IBM • Built for

    Saturn V moon rocket • Hierarchical, tree structure
  11. None
  12. 1970 Relational Model Paper • Base for IBM DB1 and

    DB2
  13. None
  14. 1980s-90s Development of RDBMS • Widely adopted • Models easy

    to define • ACID transactions • Clients for all stacks • ORMs
  15. 2000s Web 2.0 • Large volume (petabytes) • Faster networks

    and devices • Systems must scale
  16. Problems scaling Relational DBs • Sharding is hard • Maintaining

    transactions ACID is hard • Two-phase commit is hard • Parallelizing is hard
  17. None
  18. Distributed Processing Not Only SQL

  19. Relational databases The CAP theorem

  20. Key-value stores ◦ User session data ◦ Component configuration ◦

    Cached data, fast access ◦ Complex queries ◦ Interconnected data
  21. Column-oriented DB ◦ Real time analytics ◦ Facebook Messenger ◦

    Queries against few rows ◦ Flexible data schemas ◦ Incremental data loads/deletes
  22. ◦ Records with different fields ◦ Models with many layers

    ◦ Joins ◦ Flexible queries Document-oriented DB
  23. Graph DB ◦ Routing ◦ Social networks ◦ Disease spreading

    ◦ Hard to do aggregates ◦ Analytics
  24. No one magic database to rule them all • Each

    of them fits a small number of use cases • Often hard, complex and expensive to maintain • Specific query languages CQL
  25. MEANWHILE IN THE RELATIONAL BATCAVE

  26. 2010s Relational strikes back • Less structured data formats •

    Partitioning • Parallel execution • Sharding • C, A and P?
  27. None
  28. • ACID for queries going to a single shard •

    Open Source, DAAS • PostgreSQL extension • Interactive analytics • Multi-tenant • Fully ACID • Open Source • PostgreSQL fork • Scaling intensive • Multi-tenant
  29. None
  30. None
  31. None
  32. None
  33. WHEN YOU HAVE A HAMMER...

  34. Questions? tech.geoblink.com