Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL Solutions - When to Use Them?

NoSQL Solutions - When to Use Them?

For the purpose of one of the IT events I'm attending, I created a presentation containing NoSQL solutions overview. I also made a deliberation on whether they are worth our attention or not. Enjoy!

3ea55d185aee5756c52056419238eec8?s=128

Lukasz Wrobel

June 14, 2014
Tweet

Transcript

  1. NoSQL Solutions Łukasz Wróbel

  2. When to use them?

  3. About me • Architect, team leader • high-traffic websites: ◦

    nk.pl ◦ Gadu-Gadu • “Memoirs of a Software Team Leader” • @lukaszwrobel
  4. Agenda 1. Introduction to NoSQL 2. Taxonomy 3. Representative solutions

    4. When to use them?
  5. 1. Introduction

  6. Origin • the “NoSQL” term • late 90s relational database

    • June 2009, SF • NoSQL? Not Only SQL? polyglot persistence
  7. What’s wrong with RDBMSs? • rigid schema • schema migration

    • moderate performance
  8. • unnatural data modelling object-relational mapping aggregates • clustering support

  9. Does NoSQL shine? • easier schema migration or no schema

    at all • performance relaxed consistency • natural modelling • clustering
  10. No free lunch • migration is hidden • CAP theorem

    • ACID BASE
  11. Consistency Availability Partition tolerance

  12. None
  13. Consistency Availability Partition tolerance Pick two!

  14. Consistency Availability Partition tolerance Pick two!

  15. Consistency Availability Partition tolerance Pick two! nonfailing nodes

  16. No free lunch B A S E

  17. No free lunch Basically Available Soft state Eventually consistent ?

  18. 2. Taxonomy

  19. Families • key-value • document • columnar • graph

  20. Families • key-value • document • columnar • graph

  21. 3. Representative solutions

  22. Key-value

  23. Redis • ≫ a key-value store • sky RAM is

    the limit • ≫ a memcached replacement • really fast tens of thousands of operations/s
  24. Redis • lack of clustering support master-slave replication, though •

    a plethora of client libraries
  25. Data structures • hashes • lists • (sorted) sets

  26. Additional capabilities • HyperLogLog • publish/subscribe • transactions but… •

    persistence none…fsync at every query
  27. None
  28. Where to take keys from? Obvious: • e-mail uniqueness •

    date an event takes place once a day
  29. Where to take keys from? Need to generate them: •

    UUID • Snowflake retired • …?
  30. Reminds a scalable RDBMS • queried by key only •

    key-based access ⇒ easier caching • no relationships, no joins
  31. Applications • wherever high performance is required • data structures

    easier to describe than using SQL • features
  32. Document

  33. mongoDB • JSON • JavaScript querying • Map-reduce • clustering

  34. Documents • no migrations required • nice querying • aggregates

  35. Map-reduce

  36. Distribution • replica set • sharding

  37. None
  38. Performance • "humongous", but… • not as fast as advertised

    • eats up all resources • indexes required • global write lock
  39. Problems • a honeymoon and then… • too many promises

    made
  40. Applications • uhmm… • small data sets? • no performance

    requirements? • Map-reduce analytics
  41. Columnar

  42. Cassandra • big data • data analytics • fully distributed

  43. datastax.com

  44. datastax.com

  45. Distribution • peer-to-peer cluster • replication • scales well •

    no SPOF
  46. Applications • analytics • time series • column scanning •

    when almost real-time is enough
  47. Graph

  48. Neo4j • natural modelling • simple querying Cypher, Gremlin •

    built-in REST API • not that performant
  49. None
  50. Graphs • relations • friends of friends of people who

    like…
  51. Distribution • master + slaves • master is not a

    master • ZooKeeper not anymore
  52. Applications • |data| ≤ one instance • ACID required •

    less round-trips procedure-like • no massive updates
  53. Applications • recommendations • fraud detection

  54. 4. When to use them?

  55. Common problems • lack of knowledge and experience • investment

    • not as good as advertised • possible failure • limited capabilities
  56. Should you use them?

  57. None
  58. YES, but:

  59. • Only if you really can’t solve your problems right

    now. • Don’t try for trying’s sake. • Cost-effective?
  60. • Expect the unexpected. A honeymoon, remember? • Preparations. •

    Admit the failure.
  61. Thank you! @lukaszwrobel