SQL to NoSQL to NewSQL and the rise of polyglot persistence

SQL to NoSQL to NewSQL and the rise of polyglot persistence

A quick walk through database history up to NewSQL and new database models. Presented at GORUCO 2017.

39b7a68b6cbc43ec7683ad0bcc4c9570?s=128

Paul Dix

June 24, 2017
Tweet

Transcript

  1. SQL to NoSQL to NewSQL and the rise of polyglot

    persistence Paul Dix @pauldix paul@influxdb.com
  2. None
  3. None
  4. A bit about me…

  5. CTO & founder of… makers of

  6. Author

  7. I am of your tribe

  8. Nervous at GORUCO 2007

  9. To the talk!

  10. SQL’s complete dominance is over

  11. SQL is a DSL

  12. An API for…

  13. Working with

  14. Multi-paradigm is here to stay

  15. many programming languages

  16. many query languages

  17. SQL in the beginning, the Lord Ellison created SQL and

    it was good
  18. NoSQL because SQL can’t scale, yo

  19. Not only SQL because SQL isn’t the ONLY thing

  20. NewSQL because SQL can scale, and it’s the one true

    ring… err language
  21. Programmers obsessed with SQL

  22. Polyglot Persistence

  23. SQL is not the end state

  24. NoSQL is about programmer productivity!

  25. Query languages are APIs for working with data

  26. This talk is about database history, query languages, and APIs

    history, hand-wavy arguments, examples
  27. Beginning of SQL

  28. 1970 1986

  29. 1970 - Edgar F. Codd’s “A Relational Model of Data

    for Large Shared Data Banks” 1970 1986
  30. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1970 1986
  31. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986
  32. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986
  33. Oracle

  34. San Carlos Airport

  35. None
  36. SQL

  37. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986
  38. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  39. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  40. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  41. 1970’s - IBM System R 1970 - Edgar F. Codd’s

    “A Relational Model of Data for Large Shared Data Banks” 1982 - IBM DB2 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  42. 1986 - SQL-86 1970’s - IBM System R 1970 -

    Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1982 - IBM DB2 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986
  43. SQL dominance took time!

  44. QUEL

  45. QUEL range of E is EMPLOYEE retrieve into W (COMP

    = E.Salary / (E.Age - 18)) where E.Name = "Jones"
  46. QUEL range of E is EMPLOYEE retrieve into W (COMP

    = E.Salary / (E.Age - 18)) where E.Name = "Jones" select (e.salary / (e.age - 18)) as comp from employee as e where e.name = "Jones" SQL
  47. Berkeley -> Ingress -> POSTGRES

  48. POSTGRESQUEL

  49. PostgreSQL

  50. SQL isn’t fixed! • 1986 - first ANSI standard •

    1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML
  51. SQL isn’t fixed! • 1986 - first ANSI standard •

    1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML WTF?!!
  52. None
  53. SQL isn’t fixed! • 1986 - first ANSI standard •

    1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML • 2006 - moar XML, XQuery • 2008 - ORDER BY outside of cursor, INSTEAD OF triggers, TRUNCATE, FETCH • 2011 - Temporal data • 2016 - JSON, more
  54. SQL isn’t standard! • MySQL • PostgreSQL • Microsoft SQL

    Server • Oracle • DB2 • Informix
  55. ActiveRecord!

  56. Let’s talk web scale

  57. NoSQL • 2006 - “Bigtable: A Distributed Storage System for

    Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset • 2009 - first “NoSQL” event organized by Johan Oskarsson
  58. not only SQL!

  59. NoSQL • 2006 - “Bigtable: A Distributed Storage System for

    Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2007 - 10Gen founded, starting MongoDB • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset
  60. NoSQL • 2006 - “Bigtable: A Distributed Storage System for

    Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2007 - 10Gen founded, starting MongoDB • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset • 2009 - Redis started
  61. MongoDB & Redis are really what NoSQL is about…

  62. NewSQL history • 2008 - NuoDB founded • 2009 -

    VoltDB spun out of Vertica • 2010 - “Dremel, Interactive Analysis of Web-Scale Datasets” • 2010 - CitusData founded • 2011 - NewSQL coined by 451 analyst Matthew Assets • 2012 - Cloudera releases Impala • 2012 - “Spanner: Google’s Globally-Distributed Database” • 2014 - Cockroach Labs founded
  63. NewSQL: Scale with familiarity!

  64. InfluxQL select mean(usage_user) from cpu where time > now() -

    1d group by time(10m), host
  65. Familiarity != best option

  66. –Henry Ford? “If I had asked people what they wanted,

    they would have said faster horses”
  67. –Michael Fassbender as Steve Jobs “Whoever said the customer is

    always right was, I promise you, a customer.”
  68. Innovation can happen either incrementally or with a significant shift.

  69. Incremental Innovation • 1986 - first ANSI standard • 1989

    - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML?! • 2006 - moar XML, XQuery • 2008 - ORDER BY outside of cursor, INSTEAD OF triggers, TRUNCATE, FETCH • 2011 - Temporal data • 2016 - JSON, more
  70. SQL is best for all data tasks

  71. SQL is best for all data tasks

  72. Breaking Innovation

  73. Example: sorted set

  74. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

  75. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1
  76. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1
  77. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0
  78. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2
  79. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2 redis> zincrby pset -2 "asdf" "4"
  80. Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

    redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2 redis> zincrby pset -2 "asdf" "4" redis> zrank pset "asdf" (integer) 1 redis>
  81. Sorted Set (PostgreSQL) CREATE TABLE ssets ( name varchar(255), key

    varchar(255), score int, PRIMARY KEY (name, key) );
  82. Sorted Set ZADD(PostgreSQL) INSERT INTO ssets (name, key, score) VALUES

    ("pset", "foo", 5) ON CONFLICT (name, key) DO UPDATE SET score = excluded.score;
  83. Sorted Set (PostgreSQL) SELECT ranked.* FROM ( SELECT name, key,

    score, rank() over (ORDER BY pub_date ASC) AS rank from ssets where name = 'pset' ) as ranked where key = 'asdf';
  84. Sorted Set ZINCRBY(PostgreSQL) INSERT INTO ssets (name, key, score) VALUES

    ("pset", "asdf", -2) ON CONFLICT (name, key) DO UPDATE SET score = score + excluded.score;
  85. None
  86. InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126

  87. InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement

  88. InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags

  89. InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields

  90. InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields Timestamp

  91. InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields Timestamp We

    actually store up to ns scale timestamps but I couldn’t fit on the slide
  92. InfluxDB data T1 T2 T3 T4 T5 cpu, host=A, usage_user

    1.2 1.3 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 2.2 cpu, host=C, usage_user 4.5 4.8 4.9 5.0 5.0
  93. InfluxDB data T1 T2 T3 T4 T5 cpu, host=A, usage_user

    1.2 1.3 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 2.2 cpu, host=C, usage_user 4.5 4.8 4.9 5.0 5.0 Time Series
  94. Simple Average SELECT mean(usage_user) FROM cpu WHERE host = 'serverA'

    AND time > now() - 24h GROUP BY time(10m)
  95. InfluxQL 2.0? select(where: { host = 'serverA' AND metric =

    'usage_user' AND system = 'cpu' }) .range(start:-24h) .window(every:10m) .mean()
  96. Functional > SQL* *for time series

  97. fill nulls? T1 T2 T3 T4 T5 cpu, host=A, usage_user

    1.2 null 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 null cpu, host=C, usage_user 4.5 4.8 null 5.0 5.0
  98. select(where: { host = 'serverA' AND metric = 'usage_user' AND

    system = 'cpu' }) .range(start:-5m) .fill(f:mean($))
  99. interpolate? T1 T1.5 T3 T3.2 T3.6 cpu, host=A, usage_user 1.2

    1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0
  100. interpolate? T1 T1.5 T3 T3.2 T3.6 cpu, host=A, usage_user 1.2

    1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0 T1 T3 cpu, host=A, usage_user 1.2 1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0
  101. select(where: { host = 'serverA' AND metric = 'usage_user' AND

    system = 'cpu' }) .range(start:-5m) .interpolate()
  102. GraphQL -> Polyglot Persistence

  103. Innovation in waves 1970 2020

  104. Innovation in waves 1970 2020 Can’t come quick enough

  105. Innovation in waves Relational Revolution 1970 2020 2004

  106. Innovation in waves Relational Revolution 1970 2020 2004 2005 NoSQL

    2009
  107. Innovation in waves Relational Revolution 1970 2020 2004 2005 NoSQL

    2009 2008 NewSQL 2016
  108. Innovation in waves Relational Revolution 1970 2020 2004 2005 NoSQL

    2009 2008 NewSQL 2016 Polyglot Persistence
  109. Spotting a wave is like spotting a recession

  110. Python the best?

  111. C++ the best?

  112. Break free from the 40 year shackles of SQL!

  113. Incremental innovation is a great thing

  114. Not the ONLY thing

  115. Polyglot persistence is breakthrough innovation in programmer productivity.

  116. Data challenges

  117. Thank you. Paul Dix @pauldix paul@influxdb.com