Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

what? • “Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server” • Not quite a NOSQL database • Not quite a key value store

Slide 3

Slide 3 text

what? • Single threaded and event driven • Everything in memory • Individual operations are atomic • Fast (~10k EC2 t1-micro, ~30k m1-medium) • Clients in many languages

Slide 4

Slide 4 text

GETTING STARTED brew install redis # start the server redis-server # run commands redis-cli

Slide 5

Slide 5 text

BASICS SET mykey 1000 # -> 1 (success) GET mykey # -> 1000 EXPIRE mykey 5 # 6 seconds later ... GET mykey # -> nil

Slide 6

Slide 6 text

BASICS GET counter # -> nil INCR counter # -> 1 SET counter_2 "100" DECRBY counter_2 10 # -> 90

Slide 7

Slide 7 text

LISTS LPUSH mylist 100 LPUSH mylist 200 LRANGE mylist 0 100 # -> 1) 100 # -> 2) 200 RPUSH mylist -100 LLEN mylist # -> 3

Slide 8

Slide 8 text

LISTS AS QUEUES RPUSH myqueue 1000 RPUSH myqueue 1001 LRANGE myqueue 0 -1 # -> 998,999,1000,1001 LPOP myqueue # -> 998 LTRIM myqueue 0 99

Slide 9

Slide 9 text

SETS SADD neds_kids 'Arya' SMEMBERS neds_kids # -> Jon,Robb,Arya,Sansa,Brann,Rickon SCARD neds_kids # -> 6 SISMEMBER nightswatch 'Brann' # -> 0 SINTER nightswatch neds_kids # -> Jon

Slide 10

Slide 10 text

SORTED SETS ZADD scoreboard 8 "Jesse" ZADD scoreboard 7 "Walter" ZADD scoreboard 10 "Badger" ZREVRANGE scoreboard 0 1 # -> Badger, Jesse

Slide 11

Slide 11 text

HASHES HSET user:1 name "Luke" HSET user:1 role "Developer" HGET user:1 name # -> "Luke" # also HVALUES, HKEYS, HINCR, HGETALL

Slide 12

Slide 12 text

USES? • Counters (view counters, sign in counters) • Logs • Queues / Background Jobs (Resque, Sidekiq) • Cache • Inter-process & Inter-machine communication • Pub/Sub

Slide 13

Slide 13 text

BATCH PROCESSING • 5-10m line file on AWS s3, split file into 1000 line chunks • Parallelize work across N processes on M workers • Use Redis keys to track progress on UI of overall job e.g job:1010:status, job:1010:parts_remaining • Heavy API usage, use redis-based queue to cycle through API keys cross machine. Also mutex locks

Slide 14

Slide 14 text

BAYES CLASSIFIER require 'bayes_classifier' c = BayesClassifier.new c.train('Liked', 'Fantastic product, would recommend') c.train('Liked', 'Very sturdy bed, easy to build.') c.train('Disliked', 'Terrible design, already broken') c.train('Disliked', 'I am disappointed') c.classify('Terrible and rubbish') # -> Disliked

Slide 15

Slide 15 text

BAYES CLASSIFIER c.train('Disliked', 'Terrible design, already broken') # under the hood @words_count['Disliked'] += 3 @category_counts['Disliked']['terrible'] += 1 @category_counts['Disliked']['design'] += 1 @category_counts['Disliked']['broken'] += 1 c.classify("Terrible and rubbish") # .... maths!(logarithms)

Slide 16

Slide 16 text

BAYES CLASSIFIER c.train('Disliked', 'Terrible design, already broken') # under the hood redis.incrby('category_counts:disliked', 3) redis.hincrby('category_words:disliked', 'terrible', 1) redis.hincrby('category_words:disliked', 'design', 1) redis.hincrby('category_words:disliked', 'broken', 1) c.classify("Terrible and rubbish") # .... same maths, but more redis

Slide 17

Slide 17 text

BATCH PROCESSING • 10 million keywords trained in 2 hours using AWS Elasticache and Resque • Classification is O(1) due to data structures, same speed regardless of size of trained data set

Slide 18

Slide 18 text

CAVEATS • Persistence is not easy, snapshots or Append Only File and playback • No evictions by default • No partitions by default • New Redis Cluster may solve the above (April 1st 2015)

Slide 19

Slide 19 text

STEP 1:

Slide 20

Slide 20 text

STEP 2: ???

Slide 21

Slide 21 text

STEP 3: PROFIT