Slide 1

Slide 1 text

NoSQL Solutions Łukasz Wróbel

Slide 2

Slide 2 text

When to use them?

Slide 3

Slide 3 text

About me ● Architect, team leader ● high-traffic websites: ○ nk.pl ○ Gadu-Gadu ● “Memoirs of a Software Team Leader” ● @lukaszwrobel

Slide 4

Slide 4 text

Agenda 1. Introduction to NoSQL 2. Taxonomy 3. Representative solutions 4. When to use them?

Slide 5

Slide 5 text

1. Introduction

Slide 6

Slide 6 text

Origin ● the “NoSQL” term ● late 90s relational database ● June 2009, SF ● NoSQL? Not Only SQL? polyglot persistence

Slide 7

Slide 7 text

What’s wrong with RDBMSs? ● rigid schema ● schema migration ● moderate performance

Slide 8

Slide 8 text

● unnatural data modelling object-relational mapping aggregates ● clustering support

Slide 9

Slide 9 text

Does NoSQL shine? ● easier schema migration or no schema at all ● performance relaxed consistency ● natural modelling ● clustering

Slide 10

Slide 10 text

No free lunch ● migration is hidden ● CAP theorem ● ACID BASE

Slide 11

Slide 11 text

Consistency Availability Partition tolerance

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Consistency Availability Partition tolerance Pick two!

Slide 14

Slide 14 text

Consistency Availability Partition tolerance Pick two!

Slide 15

Slide 15 text

Consistency Availability Partition tolerance Pick two! nonfailing nodes

Slide 16

Slide 16 text

No free lunch B A S E

Slide 17

Slide 17 text

No free lunch Basically Available Soft state Eventually consistent ?

Slide 18

Slide 18 text

2. Taxonomy

Slide 19

Slide 19 text

Families ● key-value ● document ● columnar ● graph

Slide 20

Slide 20 text

Families ● key-value ● document ● columnar ● graph

Slide 21

Slide 21 text

3. Representative solutions

Slide 22

Slide 22 text

Key-value

Slide 23

Slide 23 text

Redis ● ≫ a key-value store ● sky RAM is the limit ● ≫ a memcached replacement ● really fast tens of thousands of operations/s

Slide 24

Slide 24 text

Redis ● lack of clustering support master-slave replication, though ● a plethora of client libraries

Slide 25

Slide 25 text

Data structures ● hashes ● lists ● (sorted) sets

Slide 26

Slide 26 text

Additional capabilities ● HyperLogLog ● publish/subscribe ● transactions but… ● persistence none…fsync at every query

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Where to take keys from? Obvious: ● e-mail uniqueness ● date an event takes place once a day

Slide 29

Slide 29 text

Where to take keys from? Need to generate them: ● UUID ● Snowflake retired ● …?

Slide 30

Slide 30 text

Reminds a scalable RDBMS ● queried by key only ● key-based access ⇒ easier caching ● no relationships, no joins

Slide 31

Slide 31 text

Applications ● wherever high performance is required ● data structures easier to describe than using SQL ● features

Slide 32

Slide 32 text

Document

Slide 33

Slide 33 text

mongoDB ● JSON ● JavaScript querying ● Map-reduce ● clustering

Slide 34

Slide 34 text

Documents ● no migrations required ● nice querying ● aggregates

Slide 35

Slide 35 text

Map-reduce

Slide 36

Slide 36 text

Distribution ● replica set ● sharding

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Performance ● "humongous", but… ● not as fast as advertised ● eats up all resources ● indexes required ● global write lock

Slide 39

Slide 39 text

Problems ● a honeymoon and then… ● too many promises made

Slide 40

Slide 40 text

Applications ● uhmm… ● small data sets? ● no performance requirements? ● Map-reduce analytics

Slide 41

Slide 41 text

Columnar

Slide 42

Slide 42 text

Cassandra ● big data ● data analytics ● fully distributed

Slide 43

Slide 43 text

datastax.com

Slide 44

Slide 44 text

datastax.com

Slide 45

Slide 45 text

Distribution ● peer-to-peer cluster ● replication ● scales well ● no SPOF

Slide 46

Slide 46 text

Applications ● analytics ● time series ● column scanning ● when almost real-time is enough

Slide 47

Slide 47 text

Graph

Slide 48

Slide 48 text

Neo4j ● natural modelling ● simple querying Cypher, Gremlin ● built-in REST API ● not that performant

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

Graphs ● relations ● friends of friends of people who like…

Slide 51

Slide 51 text

Distribution ● master + slaves ● master is not a master ● ZooKeeper not anymore

Slide 52

Slide 52 text

Applications ● |data| ≤ one instance ● ACID required ● less round-trips procedure-like ● no massive updates

Slide 53

Slide 53 text

Applications ● recommendations ● fraud detection

Slide 54

Slide 54 text

4. When to use them?

Slide 55

Slide 55 text

Common problems ● lack of knowledge and experience ● investment ● not as good as advertised ● possible failure ● limited capabilities

Slide 56

Slide 56 text

Should you use them?

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

YES, but:

Slide 59

Slide 59 text

● Only if you really can’t solve your problems right now. ● Don’t try for trying’s sake. ● Cost-effective?

Slide 60

Slide 60 text

● Expect the unexpected. A honeymoon, remember? ● Preparations. ● Admit the failure.

Slide 61

Slide 61 text

Thank you! @lukaszwrobel