Slide 1

Slide 1 text

Building a relevance platform with Couchbase and Elasticsearch Hippo GetTogether, 21 June 2013 Jeroen Reijn | @jreijn | #hgt2013 Hippo GetTogether 2013 follow the Hippo trail

Slide 2

Slide 2 text

follow the Hippo trail Hippo GetTogether 2013 About me • Architect @ Hippo • DevOps guy • Blogger @ http://blog.jeroenreijn.com

Slide 3

Slide 3 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Relevance?

Slide 4

Slide 4 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto “The capability of a search engine or function to retrieve data appropriate to a user's needs.” http://www.thefreedictionary.com/relevance

Slide 5

Slide 5 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto

Slide 6

Slide 6 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto How we deliver relevant content @Hippo

Slide 7

Slide 7 text

follow the Hippo trail Hippo GetTogether 2013 Registration Visitor - entity making HTTP requests Collector - records data about a visitor or his behavior Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam

Slide 8

Slide 8 text

follow the Hippo trail Hippo GetTogether 2013 Matching Characteristic - a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner"

Slide 9

Slide 9 text

follow the Hippo trail Hippo GetTogether 2013 What do we store? Request log Targeting data Statistics Averages, e.g. how many visitors became which persona

Slide 10

Slide 10 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto BIG DATA !!

Slide 11

Slide 11 text

follow the Hippo trail Hippo GetTogether 2013 Real-time analysis

Slide 12

Slide 12 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Architecture

Slide 13

Slide 13 text

follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery Tier Hippo Repository App server XML JSON (X)HTML

Slide 14

Slide 14 text

follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL Matching Fetch content Compose output Request Response

Slide 15

Slide 15 text

follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL Matching Targeting Data Collection Compose output Request Response Fetch content Scoring

Slide 16

Slide 16 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Scaling

Slide 17

Slide 17 text

follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery Tier Hippo Repository App server Hippo Delivery Tier Hippo Repository App server Scaling out

Slide 18

Slide 18 text

follow the Hippo trail Hippo GetTogether 2013 RDBMS Delivery Tier Repository App server Delivery Tier Repository App server Scaling out Targeting Datastore

Slide 19

Slide 19 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What kind of ‘storage’?

Slide 20

Slide 20 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Question?

Slide 21

Slide 21 text

follow the Hippo trail Hippo GetTogether 2013 Distributed Cache?

Slide 22

Slide 22 text

follow the Hippo trail Hippo GetTogether 2013 We have a winner!

Slide 23

Slide 23 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Requirements change!

Slide 24

Slide 24 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto NoSQL to the rescue

Slide 25

Slide 25 text

follow the Hippo trail Hippo GetTogether 2013 Suitable types • Key-value store • Document database

Slide 26

Slide 26 text

follow the Hippo trail Hippo GetTogether 2013 Assessment Criteria Maturity Data model Consistency model Performance Replication Caching model Query model Monitoring Scalability Reliability Support

Slide 27

Slide 27 text

follow the Hippo trail Hippo GetTogether 2013 Selection Criteria • Performance • Scalability • Schema flexibility • Simplicity • Monitoring • Support

Slide 28

Slide 28 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Performance !! Performance !!!!

Slide 29

Slide 29 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Scalability

Slide 30

Slide 30 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Schema flexibility

Slide 31

Slide 31 text

follow the Hippo trail Hippo GetTogether 2013 { "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": [] } Request log document

Slide 32

Slide 32 text

follow the Hippo trail Hippo GetTogether 2013 { "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" } } Visitor document

Slide 33

Slide 33 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Simplicity

Slide 34

Slide 34 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Monitoring

Slide 35

Slide 35 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Support

Slide 36

Slide 36 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Couchbase

Slide 37

Slide 37 text

follow the Hippo trail Hippo GetTogether 2013 Why Couchbase? • Drop-in replacement for memcached • Read/Write-through cache • High throughput • Easy scalability • Schema flexibility • Low latency

Slide 38

Slide 38 text

follow the Hippo trail Hippo GetTogether 2013 Couchbase • Open Source • Document-oriented • Easy Scalable • Consistent High Performance • Apache license

Slide 39

Slide 39 text

follow the Hippo trail Hippo GetTogether 2013 Performance • Object managed cache • Write Queue to disk • Avoids Cold Cache

Slide 40

Slide 40 text

follow the Hippo trail Hippo GetTogether 2013 Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase Copyright © Altoros Systems, Inc.

Slide 41

Slide 41 text

follow the Hippo trail Hippo GetTogether 2013 Easy scalable • Auto sharding • Cross cluster replication (XDCR) • Master - Master replication

Slide 42

Slide 42 text

follow the Hippo trail Hippo GetTogether 2013 Flexible data model • Native JSON support • Incremental Map Reduce • Gives power to the developer

Slide 43

Slide 43 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto How we run Couchbase @Hippo

Slide 44

Slide 44 text

follow the Hippo trail Hippo GetTogether 2013 Load Balancer Database cluster Hippo Delivery Tier Couchbase cluster •Request log data •Targeting data •Statistics data

Slide 45

Slide 45 text

follow the Hippo trail Hippo GetTogether 2013 Query capabilities • Querying via views • Secondary indexes via views • Views based on Map - Reduce • Lacks some advanced query capabilities

Slide 46

Slide 46 text

follow the Hippo trail Hippo GetTogether 2013 Elasticsearch • Apache Lucene • Designed to be distributed • Schema free • Apache license • RESTful API

Slide 47

Slide 47 text

follow the Hippo trail Hippo GetTogether 2013 Added value of ES • Full text search • Faceted search • Geo spatial search • All in (near) real-time

Slide 48

Slide 48 text

follow the Hippo trail Hippo GetTogether 2013 Couchbase Server Cluster Elasticsearch Server Cluster Hippo Delivery Tier Java API Write Read XDCR Couchbase ES Transport plugin Replicating to ES

Slide 49

Slide 49 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What’s Next?

Slide 50

Slide 50 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What’s Next?

Slide 51

Slide 51 text

follow the Hippo trail Hippo GetTogether 2013 Advanced analytics

Slide 52

Slide 52 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Demo time!

Slide 53

Slide 53 text

follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Thank you! Questions? [email protected] | @jreijn ps. We’re hiring!