Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Redis - Lightning Talk @ RefME

Redis - Lightning Talk @ RefME

Luke Williams

February 18, 2016
Tweet

More Decks by Luke Williams

Other Decks in Programming

Transcript

  1. what? • “Redis is an open source, BSD licensed, advanced

    key-value cache and store. It is often referred to as a data structure server” • Not quite a NOSQL database • Not quite a key value store
  2. what? • Single threaded and event driven • Everything in

    memory • Individual operations are atomic • Fast (~10k EC2 t1-micro, ~30k m1-medium) • Clients in many languages
  3. BASICS SET mykey 1000 # -> 1 (success) GET mykey

    # -> 1000 EXPIRE mykey 5 # 6 seconds later ... GET mykey # -> nil
  4. BASICS GET counter # -> nil INCR counter # ->

    1 SET counter_2 "100" DECRBY counter_2 10 # -> 90
  5. LISTS LPUSH mylist 100 LPUSH mylist 200 LRANGE mylist 0

    100 # -> 1) 100 # -> 2) 200 RPUSH mylist -100 LLEN mylist # -> 3
  6. LISTS AS QUEUES RPUSH myqueue 1000 RPUSH myqueue 1001 LRANGE

    myqueue 0 -1 # -> 998,999,1000,1001 LPOP myqueue # -> 998 LTRIM myqueue 0 99
  7. SETS SADD neds_kids 'Arya' SMEMBERS neds_kids # -> Jon,Robb,Arya,Sansa,Brann,Rickon SCARD

    neds_kids # -> 6 SISMEMBER nightswatch 'Brann' # -> 0 SINTER nightswatch neds_kids # -> Jon
  8. SORTED SETS ZADD scoreboard 8 "Jesse" ZADD scoreboard 7 "Walter"

    ZADD scoreboard 10 "Badger" ZREVRANGE scoreboard 0 1 # -> Badger, Jesse
  9. HASHES HSET user:1 name "Luke" HSET user:1 role "Developer" HGET

    user:1 name # -> "Luke" # also HVALUES, HKEYS, HINCR, HGETALL
  10. USES? • Counters (view counters, sign in counters) • Logs

    • Queues / Background Jobs (Resque, Sidekiq) • Cache • Inter-process & Inter-machine communication • Pub/Sub
  11. BATCH PROCESSING • 5-10m line file on AWS s3, split

    file into 1000 line chunks • Parallelize work across N processes on M workers • Use Redis keys to track progress on UI of overall job e.g job:1010:status, job:1010:parts_remaining • Heavy API usage, use redis-based queue to cycle through API keys cross machine. Also mutex locks
  12. BAYES CLASSIFIER require 'bayes_classifier' c = BayesClassifier.new c.train('Liked', 'Fantastic product,

    would recommend') c.train('Liked', 'Very sturdy bed, easy to build.') c.train('Disliked', 'Terrible design, already broken') c.train('Disliked', 'I am disappointed') c.classify('Terrible and rubbish') # -> Disliked
  13. BAYES CLASSIFIER c.train('Disliked', 'Terrible design, already broken') # under the

    hood @words_count['Disliked'] += 3 @category_counts['Disliked']['terrible'] += 1 @category_counts['Disliked']['design'] += 1 @category_counts['Disliked']['broken'] += 1 c.classify("Terrible and rubbish") # .... maths!(logarithms)
  14. BAYES CLASSIFIER c.train('Disliked', 'Terrible design, already broken') # under the

    hood redis.incrby('category_counts:disliked', 3) redis.hincrby('category_words:disliked', 'terrible', 1) redis.hincrby('category_words:disliked', 'design', 1) redis.hincrby('category_words:disliked', 'broken', 1) c.classify("Terrible and rubbish") # .... same maths, but more redis
  15. BATCH PROCESSING • 10 million keywords trained in 2 hours

    using AWS Elasticache and Resque • Classification is O(1) due to data structures, same speed regardless of size of trained data set
  16. CAVEATS • Persistence is not easy, snapshots or Append Only

    File and playback • No evictions by default • No partitions by default • New Redis Cluster may solve the above (April 1st 2015)