SQL to NoSQL to NewSQL and the rise of polyglot persistence

Slide 1

Slide 1 text

SQL to NoSQL to NewSQL and the rise of polyglot persistence Paul Dix @pauldix paul@inﬂuxdb.com

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

A bit about me…

Slide 5

Slide 5 text

CTO & founder of… makers of

Slide 6

Slide 6 text

Author

Slide 7

Slide 7 text

I am of your tribe

Slide 8

Slide 8 text

Nervous at GORUCO 2007

Slide 9

Slide 9 text

To the talk!

Slide 10

Slide 10 text

SQL’s complete dominance is over

Slide 11

Slide 11 text

SQL is a DSL

Slide 12

Slide 12 text

An API for…

Slide 13

Slide 13 text

Working with

Slide 14

Slide 14 text

Multi-paradigm is here to stay

Slide 15

Slide 15 text

many programming languages

Slide 16

Slide 16 text

many query languages

Slide 17

Slide 17 text

SQL in the beginning, the Lord Ellison created SQL and it was good

Slide 18

Slide 18 text

NoSQL because SQL can’t scale, yo

Slide 19

Slide 19 text

Not only SQL because SQL isn’t the ONLY thing

Slide 20

Slide 20 text

NewSQL because SQL can scale, and it’s the one true ring… err language

Slide 21

Slide 21 text

Programmers obsessed with SQL

Slide 22

Slide 22 text

Polyglot Persistence

Slide 23

Slide 23 text

SQL is not the end state

Slide 24

Slide 24 text

NoSQL is about programmer productivity!

Slide 25

Slide 25 text

Query languages are APIs for working with data

Slide 26

Slide 26 text

This talk is about database history, query languages, and APIs history, hand-wavy arguments, examples

Slide 27

Slide 27 text

Beginning of SQL

Slide 28

Slide 28 text

1970 1986

Slide 29

Slide 29 text

1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1970 1986

Slide 30

Slide 30 text

1970’s - IBM System R 1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1970 1986

Slide 31

Slide 31 text

1970’s - IBM System R 1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986

Slide 32

Slide 32 text

1970’s - IBM System R 1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986

Slide 33

Slide 33 text

Oracle

Slide 34

Slide 34 text

San Carlos Airport

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

SQL

Slide 37

Slide 37 text

1970’s - IBM System R 1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986

Slide 38

Slide 38 text

1970’s - IBM System R 1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986

Slide 39

Slide 39 text

1970’s - IBM System R 1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986

Slide 40

Slide 40 text

1970’s - IBM System R 1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986

Slide 41

Slide 41 text

1970’s - IBM System R 1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1982 - IBM DB2 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986

Slide 42

Slide 42 text

1986 - SQL-86 1970’s - IBM System R 1970 - Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1982 - IBM DB2 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986

Slide 43

Slide 43 text

SQL dominance took time!

Slide 44

Slide 44 text

QUEL

Slide 45

Slide 45 text

QUEL range of E is EMPLOYEE retrieve into W (COMP = E.Salary / (E.Age - 18)) where E.Name = "Jones"

Slide 46

Slide 46 text

QUEL range of E is EMPLOYEE retrieve into W (COMP = E.Salary / (E.Age - 18)) where E.Name = "Jones" select (e.salary / (e.age - 18)) as comp from employee as e where e.name = "Jones" SQL

Slide 47

Slide 47 text

Berkeley -> Ingress -> POSTGRES

Slide 48

Slide 48 text

POSTGRESQUEL

Slide 49

Slide 49 text

PostgreSQL

Slide 50

Slide 50 text

Slide 51

Slide 51 text

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

SQL isn’t ﬁxed! • 1986 - ﬁrst ANSI standard • 1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML • 2006 - moar XML, XQuery • 2008 - ORDER BY outside of cursor, INSTEAD OF triggers, TRUNCATE, FETCH • 2011 - Temporal data • 2016 - JSON, more

Slide 54

Slide 54 text

SQL isn’t standard! • MySQL • PostgreSQL • Microsoft SQL Server • Oracle • DB2 • Informix

Slide 55

Slide 55 text

ActiveRecord!

Slide 56

Slide 56 text

Let’s talk web scale

Slide 57

Slide 57 text

NoSQL • 2006 - “Bigtable: A Distributed Storage System for Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset • 2009 - ﬁrst “NoSQL” event organized by Johan Oskarsson

Slide 58

Slide 58 text

not only SQL!

Slide 59

Slide 59 text

NoSQL • 2006 - “Bigtable: A Distributed Storage System for Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2007 - 10Gen founded, starting MongoDB • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset

Slide 60

Slide 60 text

Slide 61

Slide 61 text

MongoDB & Redis are really what NoSQL is about…

Slide 62

Slide 62 text

NewSQL history • 2008 - NuoDB founded • 2009 - VoltDB spun out of Vertica • 2010 - “Dremel, Interactive Analysis of Web-Scale Datasets” • 2010 - CitusData founded • 2011 - NewSQL coined by 451 analyst Matthew Assets • 2012 - Cloudera releases Impala • 2012 - “Spanner: Google’s Globally-Distributed Database” • 2014 - Cockroach Labs founded

Slide 63

Slide 63 text

NewSQL: Scale with familiarity!

Slide 64

Slide 64 text

InﬂuxQL select mean(usage_user) from cpu where time > now() - 1d group by time(10m), host

Slide 65

Slide 65 text

Familiarity != best option

Slide 66

Slide 66 text

–Henry Ford? “If I had asked people what they wanted, they would have said faster horses”

Slide 67

Slide 67 text

–Michael Fassbender as Steve Jobs “Whoever said the customer is always right was, I promise you, a customer.”

Slide 68

Slide 68 text

Innovation can happen either incrementally or with a signiﬁcant shift.

Slide 69

Slide 69 text

Incremental Innovation • 1986 - ﬁrst ANSI standard • 1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML?! • 2006 - moar XML, XQuery • 2008 - ORDER BY outside of cursor, INSTEAD OF triggers, TRUNCATE, FETCH • 2011 - Temporal data • 2016 - JSON, more

Slide 70

Slide 70 text

SQL is best for all data tasks

Slide 71

Slide 71 text

SQL is best for all data tasks

Slide 72

Slide 72 text

Breaking Innovation

Slide 73

Slide 73 text

Example: sorted set

Slide 74

Slide 74 text

Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

Slide 75

Slide 75 text

Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1 redis> zadd pset 3 "bar" (integer) 1

Slide 76

Slide 76 text

Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1 redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1

Slide 77

Slide 77 text

Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1 redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0

Slide 78

Slide 78 text

Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1 redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2

Slide 79

Slide 79 text

Slide 80

Slide 80 text

Slide 81

Slide 81 text

Sorted Set (PostgreSQL) CREATE TABLE ssets ( name varchar(255), key varchar(255), score int, PRIMARY KEY (name, key) );

Slide 82

Slide 82 text

Sorted Set ZADD(PostgreSQL) INSERT INTO ssets (name, key, score) VALUES ("pset", "foo", 5) ON CONFLICT (name, key) DO UPDATE SET score = excluded.score;

Slide 83

Slide 83 text

Sorted Set (PostgreSQL) SELECT ranked.* FROM ( SELECT name, key, score, rank() over (ORDER BY pub_date ASC) AS rank from ssets where name = 'pset' ) as ranked where key = 'asdf';

Slide 84

Slide 84 text

Sorted Set ZINCRBY(PostgreSQL) INSERT INTO ssets (name, key, score) VALUES ("pset", "asdf", -2) ON CONFLICT (name, key) DO UPDATE SET score = score + excluded.score;

Slide 85

Slide 85 text

No content

Slide 86

Slide 86 text

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126

Slide 87

Slide 87 text

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement

Slide 88

Slide 88 text

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags

Slide 89

Slide 89 text

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields

Slide 90

Slide 90 text

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields Timestamp

Slide 91

Slide 91 text

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields Timestamp We actually store up to ns scale timestamps but I couldn’t ﬁt on the slide

Slide 92

Slide 92 text

InﬂuxDB data T1 T2 T3 T4 T5 cpu, host=A, usage_user 1.2 1.3 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 2.2 cpu, host=C, usage_user 4.5 4.8 4.9 5.0 5.0

Slide 93

Slide 93 text

InﬂuxDB data T1 T2 T3 T4 T5 cpu, host=A, usage_user 1.2 1.3 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 2.2 cpu, host=C, usage_user 4.5 4.8 4.9 5.0 5.0 Time Series

Slide 94

Slide 94 text

Simple Average SELECT mean(usage_user) FROM cpu WHERE host = 'serverA' AND time > now() - 24h GROUP BY time(10m)

Slide 95

Slide 95 text

InﬂuxQL 2.0? select(where: { host = 'serverA' AND metric = 'usage_user' AND system = 'cpu' }) .range(start:-24h) .window(every:10m) .mean()

Slide 96

Slide 96 text

Functional > SQL* *for time series

Slide 97

Slide 97 text

ﬁll nulls? T1 T2 T3 T4 T5 cpu, host=A, usage_user 1.2 null 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 null cpu, host=C, usage_user 4.5 4.8 null 5.0 5.0

Slide 98

Slide 98 text

select(where: { host = 'serverA' AND metric = 'usage_user' AND system = 'cpu' }) .range(start:-5m) .fill(f:mean($))

Slide 99

Slide 99 text

interpolate? T1 T1.5 T3 T3.2 T3.6 cpu, host=A, usage_user 1.2 1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0

Slide 100

Slide 100 text

interpolate? T1 T1.5 T3 T3.2 T3.6 cpu, host=A, usage_user 1.2 1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0 T1 T3 cpu, host=A, usage_user 1.2 1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0

Slide 101

Slide 101 text

select(where: { host = 'serverA' AND metric = 'usage_user' AND system = 'cpu' }) .range(start:-5m) .interpolate()

Slide 102

Slide 102 text

GraphQL -> Polyglot Persistence

Slide 103

Slide 103 text

Innovation in waves 1970 2020

Slide 104

Slide 104 text

Innovation in waves 1970 2020 Can’t come quick enough

Slide 105

Slide 105 text

Innovation in waves Relational Revolution 1970 2020 2004

Slide 106

Slide 106 text

Innovation in waves Relational Revolution 1970 2020 2004 2005 NoSQL 2009

Slide 107

Slide 107 text

Innovation in waves Relational Revolution 1970 2020 2004 2005 NoSQL 2009 2008 NewSQL 2016

Slide 108

Slide 108 text

Innovation in waves Relational Revolution 1970 2020 2004 2005 NoSQL 2009 2008 NewSQL 2016 Polyglot Persistence