Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL Solutions - When to Use Them?

NoSQL Solutions - When to Use Them?

For the purpose of one of the IT events I'm attending, I created a presentation containing NoSQL solutions overview. I also made a deliberation on whether they are worth our attention or not. Enjoy!

Lukasz Wrobel

June 14, 2014
Tweet

More Decks by Lukasz Wrobel

Other Decks in Programming

Transcript

  1. NoSQL Solutions
    Łukasz Wróbel

    View full-size slide

  2. When to use them?

    View full-size slide

  3. About me
    ● Architect, team leader
    ● high-traffic websites:
    ○ nk.pl
    ○ Gadu-Gadu
    ● “Memoirs of a Software Team Leader”
    ● @lukaszwrobel

    View full-size slide

  4. Agenda
    1. Introduction to NoSQL
    2. Taxonomy
    3. Representative solutions
    4. When to use them?

    View full-size slide

  5. 1. Introduction

    View full-size slide

  6. Origin
    ● the “NoSQL” term
    ● late 90s
    relational database
    ● June 2009, SF
    ● NoSQL? Not Only SQL?
    polyglot persistence

    View full-size slide

  7. What’s wrong with RDBMSs?
    ● rigid schema
    ● schema migration
    ● moderate performance

    View full-size slide

  8. ● unnatural data modelling
    object-relational mapping
    aggregates
    ● clustering support

    View full-size slide

  9. Does NoSQL shine?
    ● easier schema migration
    or no schema at all
    ● performance
    relaxed consistency
    ● natural modelling
    ● clustering

    View full-size slide

  10. No free lunch
    ● migration is hidden
    ● CAP theorem
    ● ACID BASE

    View full-size slide

  11. Consistency Availability
    Partition tolerance

    View full-size slide

  12. Consistency Availability
    Partition tolerance
    Pick two!

    View full-size slide

  13. Consistency Availability
    Partition tolerance
    Pick two!

    View full-size slide

  14. Consistency Availability
    Partition tolerance
    Pick two!
    nonfailing nodes

    View full-size slide

  15. No free lunch
    B
    A
    S
    E

    View full-size slide

  16. No free lunch
    Basically
    Available
    Soft state
    Eventually consistent
    ?

    View full-size slide

  17. Families
    ● key-value
    ● document
    ● columnar
    ● graph

    View full-size slide

  18. Families
    ● key-value
    ● document
    ● columnar
    ● graph

    View full-size slide

  19. 3. Representative solutions

    View full-size slide

  20. Redis
    ● ≫ a key-value store
    ● sky RAM is the limit
    ● ≫ a memcached replacement
    ● really fast
    tens of thousands of operations/s

    View full-size slide

  21. Redis
    ● lack of clustering support
    master-slave replication, though
    ● a plethora of client libraries

    View full-size slide

  22. Data structures
    ● hashes
    ● lists
    ● (sorted) sets

    View full-size slide

  23. Additional capabilities
    ● HyperLogLog
    ● publish/subscribe
    ● transactions
    but…
    ● persistence
    none…fsync at every query

    View full-size slide

  24. Where to take keys from?
    Obvious:
    ● e-mail
    uniqueness
    ● date
    an event takes place once a day

    View full-size slide

  25. Where to take keys from?
    Need to generate them:
    ● UUID
    ● Snowflake
    retired
    ● …?

    View full-size slide

  26. Reminds a scalable RDBMS
    ● queried by key only
    ● key-based access ⇒ easier caching
    ● no relationships, no joins

    View full-size slide

  27. Applications
    ● wherever high performance is required
    ● data structures
    easier to describe than using SQL
    ● features

    View full-size slide

  28. mongoDB
    ● JSON
    ● JavaScript querying
    ● Map-reduce
    ● clustering

    View full-size slide

  29. Documents
    ● no migrations required
    ● nice querying
    ● aggregates

    View full-size slide

  30. Distribution
    ● replica set
    ● sharding

    View full-size slide

  31. Performance
    ● "humongous", but…
    ● not as fast as advertised
    ● eats up all resources
    ● indexes required
    ● global write lock

    View full-size slide

  32. Problems
    ● a honeymoon and then…
    ● too many promises made

    View full-size slide

  33. Applications
    ● uhmm…
    ● small data sets?
    ● no performance requirements?
    ● Map-reduce analytics

    View full-size slide

  34. Cassandra
    ● big data
    ● data analytics
    ● fully distributed

    View full-size slide

  35. datastax.com

    View full-size slide

  36. datastax.com

    View full-size slide

  37. Distribution
    ● peer-to-peer cluster
    ● replication
    ● scales well
    ● no SPOF

    View full-size slide

  38. Applications
    ● analytics
    ● time series
    ● column scanning
    ● when almost real-time is enough

    View full-size slide

  39. Neo4j
    ● natural modelling
    ● simple querying
    Cypher, Gremlin
    ● built-in REST API
    ● not that performant

    View full-size slide

  40. Graphs
    ● relations
    ● friends of friends of people who like…

    View full-size slide

  41. Distribution
    ● master + slaves
    ● master is not a master
    ● ZooKeeper
    not anymore

    View full-size slide

  42. Applications
    ● |data| ≤ one instance
    ● ACID required
    ● less round-trips
    procedure-like
    ● no massive updates

    View full-size slide

  43. Applications
    ● recommendations
    ● fraud detection

    View full-size slide

  44. 4. When to use them?

    View full-size slide

  45. Common problems
    ● lack of knowledge and experience
    ● investment
    ● not as good as advertised
    ● possible failure
    ● limited capabilities

    View full-size slide

  46. Should you use them?

    View full-size slide

  47. ● Only if you really can’t solve your problems
    right now.
    ● Don’t try for trying’s sake.
    ● Cost-effective?

    View full-size slide

  48. ● Expect the unexpected.
    A honeymoon, remember?
    ● Preparations.
    ● Admit the failure.

    View full-size slide

  49. Thank you!
    @lukaszwrobel

    View full-size slide