NoSQL
Not only a fairy tale
Sebastian Cohnen
@tisba
tisba.de
Timo Derstappen
@teemow
adcloud.com
http://en.wikipedia.org/wiki/File:Old_book_-_Timeless_Books.jpg
Slide 2
Slide 2 text
Preface
Slide 3
Slide 3 text
Terms
• placement & ads
• ad priority
Slide 4
Slide 4 text
System Overview
• administrative
back office
• worker queue
• almost no NoSQL
• serving ads
• tracking
• here be NoSQLs!
platform adserver
publishing
ads & placements
stats &
tracking data
Slide 5
Slide 5 text
Once upon a time…
…way back in 2008
Slide 6
Slide 6 text
Simple Storage Service
Slide 7
Slide 7 text
Publishing to S3
• gather ad & placement data
• add some JavaScript
• publish everything to S3
Slide 8
Slide 8 text
Ad Delivery via S3
• user visits a website
• deliver JavaScript via CDN
• choose and display ads
Slide 9
Slide 9 text
but,
• publishing to S3 was rather expensive
• no incremental update of denormalized
data
CouchDB only
• normalize the data (a bit)
• split by update frequency
• BUT… n-m relations are hard to model
• and persistent, incremental views are rather
useless to us
Slide 13
Slide 13 text
:-(
Slide 14
Slide 14 text
CouchDB + node.js
• use node.js to assemble data (n-m relation)
• cache response using nginx
• also cache some data in node.js
Slide 15
Slide 15 text
Request flow
• incoming request
• nginx cache miss
• fetch placement & priorities
• process data & fetch ads
• send response
Slide 16
Slide 16 text
How to monitor
Consistency?
• write tracer documents
• measure replication delay
Slide 17
Slide 17 text
Achievements
• reduced turnaround for publishing
priorities by >50%
• build foundation for new features
Slide 18
Slide 18 text
New Feature Requests
…ahead in early 2011
Slide 19
Slide 19 text
The Problem
• requests eventually are going to be unique
• therefor less requests can be cached
• CouchDB too slow for our needs
• caching things within a node.js process was
a bad idea too
Slide 20
Slide 20 text
Redis
• during a cache warmup phase we pre-fill
redis with placement and ad data
• all live request are served out of redis
• data is updated in the background
Slide 21
Slide 21 text
…in late 2011
Scalability
Slide 22
Slide 22 text
How we used
CouchDB
• >10k updates/h
• single source of changes
• multi-master replication
• append-only
• durability
• MVCC
usage not required
Slide 23
Slide 23 text
Resulting Issues
• problems with replication and high load
• more instances, more replication, even
more load
• compaction was a pain too
Slide 24
Slide 24 text
Whose fault?
• not only CouchDB’s fault
• simply the wrong use case
• one source for updates
• no need for append-only reliability
Slide 25
Slide 25 text
What now?
Slide 26
Slide 26 text
Back to S3!
• with Redis caching in place…
• move placement and ad data to S3
• cache warming upfront and background
updates work just fine!
Slide 27
Slide 27 text
S3 vs CouchDB
• S3 simply fits our needs
• no need to implement sync checks or run
compaction
• fewer moving parts
• less state on our application servers
Slide 28
Slide 28 text
Once again,
more features
…ahead in early 2012
Slide 29
Slide 29 text
Status Quo
• first S3-based “adserver” did the ad
selection on the client side
• to a certain degree this is still the case
Slide 30
Slide 30 text
The Challenge
• prepare the systems for Real-time bidding
• enable the adserver to decide ad selection
server-side
• do it fast, say within 25ms or less
Slide 31
Slide 31 text
Remember Redis?
• we know and trust Redis’ performance
• it has sorted sets
• we have sets of ads to display for a placement
Eureka!
Slide 32
Slide 32 text
Redis Reloaded!
• heavily use sorted sets
• create sets of ads…
• we can choose from
• which cannot be displayed at all
• use ZUNIONSTORE & ZRANGEBYSCORE
to precisely select ads
Slide 33
Slide 33 text
Redis Reloaded!
• Redis became a deeply integrated part of
the core business logic
• it was very easy to model our needs with
Redis
• besides enabling new features, we reduced
the response payload by >75%
Slide 34
Slide 34 text
Conclusion
Slide 35
Slide 35 text
• try to go as incremental as possible
• drivers for architectural decisions…
• features
• quality & performance
• scalability
What worked for us…
Slide 36
Slide 36 text
The End!
Slide 37
Slide 37 text
• Questions (if time permits)
• Visit us at the adcloud booth
Sebastian Cohnen
@tisba
tisba.de
Timo Derstappen
@teemow
adcloud.com
The End!