SQL to NoSQL to NewSQL and the rise of polyglot persistence

SQL to NoSQL to NewSQL and the rise of polyglot
persistence Paul Dix @pauldix paul@inﬂuxdb.com

A bit about me…

CTO & founder of… makers of

Author

I am of your tribe

Nervous at GORUCO 2007

To the talk!

SQL’s complete dominance is over

SQL is a DSL

An API for…

Working with

Multi-paradigm is here to stay

many programming languages

many query languages

SQL in the beginning, the Lord Ellison created SQL and
it was good

NoSQL because SQL can’t scale, yo

Not only SQL because SQL isn’t the ONLY thing

NewSQL because SQL can scale, and it’s the one true
ring… err language

Programmers obsessed with SQL

Polyglot Persistence

SQL is not the end state

NoSQL is about programmer productivity!

Query languages are APIs for working with data

This talk is about database history, query languages, and APIs
history, hand-wavy arguments, examples

Beginning of SQL

1970 1986

1970 - Edgar F. Codd’s “A Relational Model of Data
for Large Shared Data Banks” 1970 1986

1970’s - IBM System R 1970 - Edgar F. Codd’s
“A Relational Model of Data for Large Shared Data Banks” 1970 1986

“A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986

Oracle

San Carlos Airport

“A Relational Model of Data for Large Shared Data Banks” 1979 - Relational Software Oracle V2 1970 1986

“A Relational Model of Data for Large Shared Data Banks” 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986

“A Relational Model of Data for Large Shared Data Banks” 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986

“A Relational Model of Data for Large Shared Data Banks” 1982 - IBM DB2 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986

1986 - SQL-86 1970’s - IBM System R 1970 -
Edgar F. Codd’s “A Relational Model of Data for Large Shared Data Banks” 1982 - IBM DB2 1981 - IBM SQL/DS 1979 - IBM System/38 1979 - Relational Software Oracle V2 1970 1986

SQL dominance took time!

QUEL range of E is EMPLOYEE retrieve into W (COMP
= E.Salary / (E.Age - 18)) where E.Name = "Jones"

QUEL range of E is EMPLOYEE retrieve into W (COMP
= E.Salary / (E.Age - 18)) where E.Name = "Jones" select (e.salary / (e.age - 18)) as comp from employee as e where e.name = "Jones" SQL

Berkeley -> Ingress -> POSTGRES

POSTGRESQUEL

PostgreSQL

SQL isn’t ﬁxed! • 1986 - ﬁrst ANSI standard •
1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML

1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML WTF?!!

1989 - minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML • 2006 - moar XML, XQuery • 2008 - ORDER BY outside of cursor, INSTEAD OF triggers, TRUNCATE, FETCH • 2011 - Temporal data • 2016 - JSON, more

SQL isn’t standard! • MySQL • PostgreSQL • Microsoft SQL
Server • Oracle • DB2 • Informix

ActiveRecord!

Let’s talk web scale

NoSQL • 2006 - “Bigtable: A Distributed Storage System for
Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset • 2009 - ﬁrst “NoSQL” event organized by Johan Oskarsson

not only SQL!

Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2007 - 10Gen founded, starting MongoDB • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset

Structured Data” • 2007 - “Dynamo: Amazon’s Highly Available Key-value Store” • 2007 - 10Gen founded, starting MongoDB • 2008 - Cassandra Open Sourced, paper in 2010 • 2008 - Basho, creators of Riak, founded • 2008 - HBase started out of Powerset • 2009 - Redis started

MongoDB & Redis are really what NoSQL is about…

NewSQL history • 2008 - NuoDB founded • 2009 -
VoltDB spun out of Vertica • 2010 - “Dremel, Interactive Analysis of Web-Scale Datasets” • 2010 - CitusData founded • 2011 - NewSQL coined by 451 analyst Matthew Assets • 2012 - Cloudera releases Impala • 2012 - “Spanner: Google’s Globally-Distributed Database” • 2014 - Cockroach Labs founded

NewSQL: Scale with familiarity!

InﬂuxQL select mean(usage_user) from cpu where time > now() -
1d group by time(10m), host

Familiarity != best option

–Henry Ford? “If I had asked people what they wanted,
they would have said faster horses”

–Michael Fassbender as Steve Jobs “Whoever said the customer is
always right was, I promise you, a customer.”

Innovation can happen either incrementally or with a signiﬁcant shift.

Incremental Innovation • 1986 - ﬁrst ANSI standard • 1989
- minor revision, integrity constraints • 1992 - major revision • 1999 - regexes, triggers, procedural statements, arrays • 2003 - XML?! • 2006 - moar XML, XQuery • 2008 - ORDER BY outside of cursor, INSTEAD OF triggers, TRUNCATE, FETCH • 2011 - Temporal data • 2016 - JSON, more

SQL is best for all data tasks

Breaking Innovation

Example: sorted set

Sorted Set (redis) redis> zadd pset 5 "foo" (integer) 1

redis> zadd pset 3 "bar" (integer) 1

redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1

redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0

redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2

redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2 redis> zincrby pset -2 "asdf" "4"

redis> zadd pset 3 "bar" (integer) 1 redis> zadd pset 6 "asdf" (integer) 1 redis> zrank pset “bar" (integer) 0 redis> zrank pset "asdf" (integer) 2 redis> zincrby pset -2 "asdf" "4" redis> zrank pset "asdf" (integer) 1 redis>

Sorted Set (PostgreSQL) CREATE TABLE ssets ( name varchar(255), key
varchar(255), score int, PRIMARY KEY (name, key) );

Sorted Set ZADD(PostgreSQL) INSERT INTO ssets (name, key, score) VALUES
("pset", "foo", 5) ON CONFLICT (name, key) DO UPDATE SET score = excluded.score;

Sorted Set (PostgreSQL) SELECT ranked.* FROM ( SELECT name, key,
score, rank() over (ORDER BY pub_date ASC) AS rank from ssets where name = 'pset' ) as ranked where key = 'asdf';

Sorted Set ZINCRBY(PostgreSQL) INSERT INTO ssets (name, key, score) VALUES
("pset", "asdf", -2) ON CONFLICT (name, key) DO UPDATE SET score = score + excluded.score;

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields Timestamp

InﬂuxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields Timestamp We
actually store up to ns scale timestamps but I couldn’t ﬁt on the slide

InﬂuxDB data T1 T2 T3 T4 T5 cpu, host=A, usage_user
1.2 1.3 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 2.2 cpu, host=C, usage_user 4.5 4.8 4.9 5.0 5.0

InﬂuxDB data T1 T2 T3 T4 T5 cpu, host=A, usage_user
1.2 1.3 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 2.2 cpu, host=C, usage_user 4.5 4.8 4.9 5.0 5.0 Time Series

Simple Average SELECT mean(usage_user) FROM cpu WHERE host = 'serverA'
AND time > now() - 24h GROUP BY time(10m)

InﬂuxQL 2.0? select(where: { host = 'serverA' AND metric =
'usage_user' AND system = 'cpu' }) .range(start:-24h) .window(every:10m) .mean()

Functional > SQL* *for time series

ﬁll nulls? T1 T2 T3 T4 T5 cpu, host=A, usage_user
1.2 null 1.1 1.1 1.0 cpu, host=B, usage_user 2.1 1.8 1.9 2.0 null cpu, host=C, usage_user 4.5 4.8 null 5.0 5.0

select(where: { host = 'serverA' AND metric = 'usage_user' AND
system = 'cpu' }) .range(start:-5m) .fill(f:mean($))

interpolate? T1 T1.5 T3 T3.2 T3.6 cpu, host=A, usage_user 1.2
1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0

interpolate? T1 T1.5 T3 T3.2 T3.6 cpu, host=A, usage_user 1.2
1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0 T1 T3 cpu, host=A, usage_user 1.2 1.1 cpu, host=B, usage_user 1.8 1.9 cpu, host=C, usage_user 4.5 5.0

select(where: { host = 'serverA' AND metric = 'usage_user' AND
system = 'cpu' }) .range(start:-5m) .interpolate()

GraphQL -> Polyglot Persistence

Innovation in waves 1970 2020

Innovation in waves 1970 2020 Can’t come quick enough

Innovation in waves Relational Revolution 1970 2020 2004

Innovation in waves Relational Revolution 1970 2020 2004 2005 NoSQL
2009

2009 2008 NewSQL 2016

2009 2008 NewSQL 2016 Polyglot Persistence

Spotting a wave is like spotting a recession

Python the best?

C++ the best?

Break free from the 40 year shackles of SQL!

Incremental innovation is a great thing

Not the ONLY thing

Polyglot persistence is breakthrough innovation in programmer productivity.

Data challenges

Thank you. Paul Dix @pauldix paul@inﬂuxdb.com

SQL to NoSQL to NewSQL and the rise of polyglot...

SQL to NoSQL to NewSQL and the rise of polyglot persistence

More Decks by Paul Dix

Other Decks in Technology

Featured

Transcript