NoSQL Solutions - When to Use Them?
by
Lukasz Wrobel
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
NoSQL Solutions Łukasz Wróbel
Slide 2
Slide 2 text
When to use them?
Slide 3
Slide 3 text
About me ● Architect, team leader ● high-traffic websites: ○ nk.pl ○ Gadu-Gadu ● “Memoirs of a Software Team Leader” ● @lukaszwrobel
Slide 4
Slide 4 text
Agenda 1. Introduction to NoSQL 2. Taxonomy 3. Representative solutions 4. When to use them?
Slide 5
Slide 5 text
1. Introduction
Slide 6
Slide 6 text
Origin ● the “NoSQL” term ● late 90s relational database ● June 2009, SF ● NoSQL? Not Only SQL? polyglot persistence
Slide 7
Slide 7 text
What’s wrong with RDBMSs? ● rigid schema ● schema migration ● moderate performance
Slide 8
Slide 8 text
● unnatural data modelling object-relational mapping aggregates ● clustering support
Slide 9
Slide 9 text
Does NoSQL shine? ● easier schema migration or no schema at all ● performance relaxed consistency ● natural modelling ● clustering
Slide 10
Slide 10 text
No free lunch ● migration is hidden ● CAP theorem ● ACID BASE
Slide 11
Slide 11 text
Consistency Availability Partition tolerance
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
Consistency Availability Partition tolerance Pick two!
Slide 14
Slide 14 text
Consistency Availability Partition tolerance Pick two!
Slide 15
Slide 15 text
Consistency Availability Partition tolerance Pick two! nonfailing nodes
Slide 16
Slide 16 text
No free lunch B A S E
Slide 17
Slide 17 text
No free lunch Basically Available Soft state Eventually consistent ?
Slide 18
Slide 18 text
2. Taxonomy
Slide 19
Slide 19 text
Families ● key-value ● document ● columnar ● graph
Slide 20
Slide 20 text
Families ● key-value ● document ● columnar ● graph
Slide 21
Slide 21 text
3. Representative solutions
Slide 22
Slide 22 text
Key-value
Slide 23
Slide 23 text
Redis ● ≫ a key-value store ● sky RAM is the limit ● ≫ a memcached replacement ● really fast tens of thousands of operations/s
Slide 24
Slide 24 text
Redis ● lack of clustering support master-slave replication, though ● a plethora of client libraries
Slide 25
Slide 25 text
Data structures ● hashes ● lists ● (sorted) sets
Slide 26
Slide 26 text
Additional capabilities ● HyperLogLog ● publish/subscribe ● transactions but… ● persistence none…fsync at every query
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
Where to take keys from? Obvious: ● e-mail uniqueness ● date an event takes place once a day
Slide 29
Slide 29 text
Where to take keys from? Need to generate them: ● UUID ● Snowflake retired ● …?
Slide 30
Slide 30 text
Reminds a scalable RDBMS ● queried by key only ● key-based access ⇒ easier caching ● no relationships, no joins
Slide 31
Slide 31 text
Applications ● wherever high performance is required ● data structures easier to describe than using SQL ● features
Slide 32
Slide 32 text
Document
Slide 33
Slide 33 text
mongoDB ● JSON ● JavaScript querying ● Map-reduce ● clustering
Slide 34
Slide 34 text
Documents ● no migrations required ● nice querying ● aggregates
Slide 35
Slide 35 text
Map-reduce
Slide 36
Slide 36 text
Distribution ● replica set ● sharding
Slide 37
Slide 37 text
No content
Slide 38
Slide 38 text
Performance ● "humongous", but… ● not as fast as advertised ● eats up all resources ● indexes required ● global write lock
Slide 39
Slide 39 text
Problems ● a honeymoon and then… ● too many promises made
Slide 40
Slide 40 text
Applications ● uhmm… ● small data sets? ● no performance requirements? ● Map-reduce analytics
Slide 41
Slide 41 text
Columnar
Slide 42
Slide 42 text
Cassandra ● big data ● data analytics ● fully distributed
Slide 43
Slide 43 text
datastax.com
Slide 44
Slide 44 text
datastax.com
Slide 45
Slide 45 text
Distribution ● peer-to-peer cluster ● replication ● scales well ● no SPOF
Slide 46
Slide 46 text
Applications ● analytics ● time series ● column scanning ● when almost real-time is enough
Slide 47
Slide 47 text
Graph
Slide 48
Slide 48 text
Neo4j ● natural modelling ● simple querying Cypher, Gremlin ● built-in REST API ● not that performant
Slide 49
Slide 49 text
No content
Slide 50
Slide 50 text
Graphs ● relations ● friends of friends of people who like…
Slide 51
Slide 51 text
Distribution ● master + slaves ● master is not a master ● ZooKeeper not anymore
Slide 52
Slide 52 text
Applications ● |data| ≤ one instance ● ACID required ● less round-trips procedure-like ● no massive updates
Slide 53
Slide 53 text
Applications ● recommendations ● fraud detection
Slide 54
Slide 54 text
4. When to use them?
Slide 55
Slide 55 text
Common problems ● lack of knowledge and experience ● investment ● not as good as advertised ● possible failure ● limited capabilities
Slide 56
Slide 56 text
Should you use them?
Slide 57
Slide 57 text
No content
Slide 58
Slide 58 text
YES, but:
Slide 59
Slide 59 text
● Only if you really can’t solve your problems right now. ● Don’t try for trying’s sake. ● Cost-effective?
Slide 60
Slide 60 text
● Expect the unexpected. A honeymoon, remember? ● Preparations. ● Admit the failure.
Slide 61
Slide 61 text
Thank you! @lukaszwrobel