Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SQL to NoSQL to NewSQL and the rise of polyglot persistence

SQL to NoSQL to NewSQL and the rise of polyglot persistence

A quick walk through database history up to NewSQL and new database models. Presented at GORUCO 2017.

Paul Dix

June 24, 2017
Tweet

More Decks by Paul Dix

Other Decks in Technology

Transcript

  1. SQL to NoSQL to NewSQL and the rise of polyglot

    persistence Paul Dix @pauldix paul@influxdb.com
  2. 1970 - Edgar F. Codd’s “A Relational Model of Data

    for Large Shared Data Banks” 1970 1986
  3. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1970 1986
  4. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986
  5. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986
  6. SQL

  7. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986
  8. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  9. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  10. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  11. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1982 - IBM DB2 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  12. 1986 - SQL-86 1970’s - IBM System R 1970 -

    Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1982 - IBM DB2 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  13. QUEL range of E is EMPLOYEE retrieve into W (COMP

    = E.Salary / (E.Age - 18)) where E.Name = "Jones"
  14. QUEL range of E is EMPLOYEE retrieve into W (COMP

    = E.Salary / (E.Age - 18)) where E.Name = "Jones" select (e.salary / (e.age - 18)) as comp from employee as e where e.name = "Jones" SQL
  15. SQL isn’t fixed! • 1986 - first ANSI standard •

    1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML
  16. SQL isn’t fixed! • 1986 - first ANSI standard •

    1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML WTF?!!
  17. SQL isn’t fixed! • 1986 - first ANSI standard •

    1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML • 2006 - moar XML, XQuery • 2008 - ORDER BY outside of cursor, INSTEAD OF triggers, TRUNCATE, FETCH • 2011 - Temporal data • 2016 - JSON, more
  18. NoSQL • 2006 - “Bigtable: A Distributed Storage System for

    Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset • 2009 - first “NoSQL” event organized by Johan Oskarsson
  19. NoSQL • 2006 - “Bigtable: A Distributed Storage System for

    Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2007 - 10Gen founded, starting MongoDB • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset
  20. NoSQL • 2006 - “Bigtable: A Distributed Storage System for

    Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2007 - 10Gen founded, starting MongoDB • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset • 2009 - Redis started
  21. NewSQL history • 2008 - NuoDB founded • 2009 -

    VoltDB spun out of Vertica • 2010 - “Dremel, Interactive Analysis of Web-Scale Datasets” • 2010 - CitusData founded • 2011 - NewSQL coined by 451 analyst Matthew Assets • 2012 - Cloudera releases Impala • 2012 - “Spanner: Google’s Globally-Distributed Database” • 2014 - Cockroach Labs founded
  22. –Henry Ford? “If I had asked people what they wanted,

    they would have said faster horses”
  23. –Michael Fassbender as Steve Jobs “Whoever said the customer is

    always right was, I promise you, a customer.”
  24. Incremental Innovation • 1986 - first ANSI standard • 1989

    - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML?! • 2006 - moar XML, XQuery • 2008 - ORDER BY outside of cursor, INSTEAD OF triggers, TRUNCATE, FETCH • 2011 - Temporal data • 2016 - JSON, more
  25. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1
  26. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1
  27. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0
  28. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2
  29. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2 redis> zincrby pset -2 "asdf" "4"
  30. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2 redis> zincrby pset -2 "asdf" "4" redis> zrank pset "asdf" (integer) 1 redis>
  31. Sorted Set (PostgreSQL) CREATE TABLE ssets ( name varchar(255), key

    varchar(255), score int, PRIMARY KEY (name, key) );
  32. Sorted Set ZADD(PostgreSQL) INSERT INTO ssets (name, key, score) VALUES

    ("pset", "foo", 5) ON CONFLICT (name, key) DO UPDATE SET score = excluded.score;
  33. Sorted Set (PostgreSQL) SELECT ranked.* FROM ( SELECT name, key,

    score, rank() over (ORDER BY pub_date ASC) AS rank from ssets where name = 'pset' ) as ranked where key = 'asdf';
  34. Sorted Set ZINCRBY(PostgreSQL) INSERT INTO ssets (name, key, score) VALUES

    ("pset", "asdf", -2) ON CONFLICT (name, key) DO UPDATE SET score = score + excluded.score;
  35. InfluxDB data T1 T2 T3 T4 T5 cpu, host=A, usage_user

    1.2 1.3 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 2.2 cpu, host=C, usage_user 4.5 4.8 4.9 5.0 5.0
  36. InfluxDB data T1 T2 T3 T4 T5 cpu, host=A, usage_user

    1.2 1.3 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 2.2 cpu, host=C, usage_user 4.5 4.8 4.9 5.0 5.0 Time Series
  37. InfluxQL 2.0? select(where: { host = 'serverA' AND metric =

    'usage_user' AND system = 'cpu' }) .range(start:-24h) .window(every:10m) .mean()
  38. fill nulls? T1 T2 T3 T4 T5 cpu, host=A, usage_user

    1.2 null 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 null cpu, host=C, usage_user 4.5 4.8 null 5.0 5.0
  39. select(where: { host = 'serverA' AND metric = 'usage_user' AND

    system = 'cpu' }) .range(start:-5m) .fill(f:mean($))
  40. interpolate? T1 T1.5 T3 T3.2 T3.6 cpu, host=A, usage_user 1.2

    1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0
  41. interpolate? T1 T1.5 T3 T3.2 T3.6 cpu, host=A, usage_user 1.2

    1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0 T1 T3 cpu, host=A, usage_user 1.2 1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0
  42. select(where: { host = 'serverA' AND metric = 'usage_user' AND

    system = 'cpu' }) .range(start:-5m) .interpolate()
  43. Innovation in waves Relational Revolution 1970 2020 2004 2005 NoSQL

    2009 2008 NewSQL 2016 Polyglot Persistence