Riak on Retail

Riak on Retail

Overview of Riak, the open source distributed database, for retail and eCommerce platform and services. Covers use cases including shopping carts, product catalogs, and mobile apps; data modeling and querying; architecture and operations.


Basho Technologies

February 12, 2013


  1. Riak on Retail

  2. None
  3. What`s in store? •  At a High Level •  For

    Developers •  Under the Hood •  When and Why •  Some Use Cases •  Commercial Extensions •  Latest Release and 1.3
  4. At a High Level

  5. •  Built on Amazon principles (Dynamo paper) •  Key/value data

    model •  with some extras: search, MapReduce, 2i, links, pre- and post-commit hooks, pluggable backends, HTTP and binary interfaces •  Written in Erlang with C/C++ •  Open source under Apache 2 License Riak
  6. Riak’s Design Goals •  High-availability •  Low-latency •  Horizontal Scalability

    •  Fault Tolerance •  Ops Friendliness •  Predictability
  7. Retail / eCommerce Use Cases •  Shopping cart functionality • 

    Must be highly available •  High latency is perceived as unavailability •  Withstands node failure, network partition, datacenter failure •  Many of the same architectural principles that power Amazon’s shopping cart
  8. Retail / eCommerce Use Cases •  Product Catalog •  Up

    to tens of thousands or more inventory items •  Content agnostic: images, video, text, JSON/XML/ HTML documents •  Add and serve product data even under failure conditions •  Scale out without sharding
  9. Retail / eCommerce Use Cases •  API Platforms •  Expose

    data as a platform to internal and external client, developers and partners/affiliates •  Flexible, schemaless design •  RESTful HTTP API, protocol buffers and many client libraries •  Throughput and capacity scales linearly with growth
  10. Retail / eCommerce Use Cases •  Mobile Applications •  Riak

    powers top consumer mobile apps including Bump and Voxer •  Fast, small object storage •  Designed for concurrency to meet mobile client request patterns
  11. For Developers

  12. Riak is a database that stores keys against values. Keys

    are grouped into a higher-level namespace called buckets.
  13. Riak doesn’t care what you store. It will accept any

    data type; things are stored on disk as binaries.
  14. None
  15. None
  16. None
  17. None
  18. Examples Type Key Value Item in Product Inventory Product Name,

    SKU or ID JSON, XML or Text, HTML doc Product Advertising Campaign ID Ad Content User Profile Login, Email, UUID User attributes (often, JSON doc) Image or Video Content Content Name, ID or Integer Image or video file format Session Information User/Session ID Session Data
  19. Two APIs 1.  HTTP (just like the web) 2.  Protocol

    Buffers (thank you, Google)
  20. Querying GET/PUT/DELETE MapReduce: Filtering product info by tag, counting items,

    extracting links Full-Text Search: Searching product info or descriptions Secondary Indexes (2i): Tagging products with categories, promotion identifiers, etc.
  21. Client Libraries Ruby, Node.js, Java, Python, Perl, OCaml, Erlang, PHP,

    C, Squeak, Smalltalk, Pharoah, Clojure, Scala, Haskell, Lisp, Go, .NET, Play, and more (supported by either Basho or the community).
  22. Under the Hood

  23. Hard problems in databases: Single points of failure.

  24. Availability ß  master ß  slave slave à Relational Architecture

  25. Availability ß  master ß  slave slave à write

  26. Availability ß  master ß  slave slave à write

  27. Masterless; deployed as a cluster of nodes

  28. ALL NODES ARE DECLARED EQUAL. write read read write write

    write read write read
  29. Hard problems in databases: Where to put the data.

  30. Sharding in Relational Systems… A - D E - K

    L - P Q - T U - Z
  31. It Hurts. •  Hot spots •  Unevenly spread data and

    request patterns •  Resharding is operationally intensive, often manual A - D E - K L - P Q - T U - Z
  32. Don’t Shard. Riak’s Consistent Hashing •  Evenly spreads data around

    the cluster •  Automatically rebalances data when machines are added
  33. None
  34. None
  35. None
  36. None
  37. None
  38. None
  39. None
  40. None
  41. None
  42. None
  43. None
  44. Riak: when and why

  45. When Might Riak Make Sense When you have enough data

    to require >1 physical machine (preferably >5) When availability is more important than consistency (think “critical data”on “big data”) When your data can be modeled as keys and values; don’t be afraid to denormalize
  46. •  Case study on Basho.com •  Millions of users • 

    Highly available, event-based shopping experience •  “Riak is one of those things that just works and doesn’t need our attention on a day-to- day basis, saving both time and money.”
  47. http://vimeo.com/54384814

  48. Ad Serving •  OpenX will serve ~4T ad in 2012

    •  Started with CouchDB and Cassandra for various parts of infrastructure •  Now consolidating on Riak and Riak Core •  Video on Ricon2012.com
  49. Mobile Apps •  Bump – easy to share contact info,

    photos, other objects •  Picked Riak for operational ease of use •  “It does what it’s supposed to do; nodes can go down but Riak will still work. It’s great to be able to deal with node failures the next day instead of at 3am.”
  50. •  Copious – eCommerce marketplace •  Uses Riak to store

    all registered accounts and tokens for social media login •  100s of thousands of keys
  51. Application Essentials…. •  Session storage •  Log files •  User

  52. Riak : Hybrid Solutions •  Riak with Postgres •  Riak

    with Elastic Search •  Riak with Hadoop •  Secondary analytics clusters
  53. Try Us On… •  Amazon AMIs •  EngineYard beta (more

    details next week) •  Microsoft Azure VM Depot •  Riakon.com
  54. Buy Some Software...

  55. Riak Enterprise •  Multi-datacenter replication •  Real-time or full sync

  56. Use Cases •  Data locality to serve clients and partners

    at low- latency anywhere in the world •  Failover to other sites in the event of data center failure •  Full sync and real-time sync, can be configured uni- directionally or bi-directionally
  57. Riak Cloud Storage •  Large object support •  S3-compatible API

    •  Multi-tenancy •  Reporting on usage
  58. •  docs.basho.com •  @basho •  github.com/basho Riak