Slide 1

Slide 1 text

OneHippo @ Goto follow the Hippo trail Building a relevance platform with Couchbase and Elasticsearch @jreijn | Hippo #gotoams, June 18

Slide 2

Slide 2 text

follow the Hippo trail OneHippo @ Goto About me • Architect @ Hippo • DevOps guy • Blogger @ http://blog.jeroenreijn.com

Slide 3

Slide 3 text

follow the Hippo trail OneHippo @ Goto About Hippo

Slide 4

Slide 4 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Relevance?

Slide 5

Slide 5 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto “The capability of a search engine or function to retrieve data appropriate to a user's needs.” http://www.thefreedictionary.com/relevance

Slide 6

Slide 6 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto

Slide 7

Slide 7 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto How we deliver relevant content @Hippo

Slide 8

Slide 8 text

follow the Hippo trail OneHippo @ Goto Registration Visitor - entity making HTTP requests Collector - records data about a visitor or his behavior Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam

Slide 9

Slide 9 text

follow the Hippo trail OneHippo @ Goto Matching Characteristic - a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner"

Slide 10

Slide 10 text

follow the Hippo trail OneHippo @ Goto What do we store? Request log Targeting data Statistics Averages, e.g. how many visitors became which persona

Slide 11

Slide 11 text

follow the Hippo trail OneHippo @ Goto Real-time analysis

Slide 12

Slide 12 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Architecture

Slide 13

Slide 13 text

follow the Hippo trail OneHippo @ Goto RDBMS Hippo Delivery Tier Hippo Repository App server XML JSON (X)HTML

Slide 14

Slide 14 text

follow the Hippo trail OneHippo @ Goto Delivery Tier URL Matching Fetch content Compose output Request Response

Slide 15

Slide 15 text

follow the Hippo trail OneHippo @ Goto Delivery Tier URL Matching Targeting Data Collection Compose output Request Response Fetch content Scoring

Slide 16

Slide 16 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Scaling

Slide 17

Slide 17 text

follow the Hippo trail OneHippo @ Goto RDBMS Hippo Delivery Tier Hippo Repository App server Hippo Delivery Tier Hippo Repository App server Scaling out

Slide 18

Slide 18 text

follow the Hippo trail OneHippo @ Goto RDBMS Delivery Tier Repository App server Delivery Tier Repository App server Scaling out Targeting Datastore

Slide 19

Slide 19 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto What kind of ‘storage’?

Slide 20

Slide 20 text

follow the Hippo trail OneHippo @ Goto Distributed Cache?

Slide 21

Slide 21 text

follow the Hippo trail OneHippo @ Goto We have a winner!

Slide 22

Slide 22 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Requirements change!

Slide 23

Slide 23 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto NoSQL to the rescue

Slide 24

Slide 24 text

follow the Hippo trail OneHippo @ Goto Suitable types • Key-value store • Document database

Slide 25

Slide 25 text

follow the Hippo trail OneHippo @ Goto Assessment Criteria Maturity Data model Consistency model Performance Replication Caching model Query model Monitoring Scalability Reliability Support

Slide 26

Slide 26 text

follow the Hippo trail OneHippo @ Goto Selection Criteria • Performance! • Scalability • Schema flexibility • Simplicity • Monitoring • Support

Slide 27

Slide 27 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Performance !!

Slide 28

Slide 28 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Scalability

Slide 29

Slide 29 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Schema flexibility

Slide 30

Slide 30 text

follow the Hippo trail OneHippo @ Goto { "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": [] } Request log document

Slide 31

Slide 31 text

follow the Hippo trail OneHippo @ Goto { "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" } } Visitor document

Slide 32

Slide 32 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Simplicity

Slide 33

Slide 33 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Monitoring

Slide 34

Slide 34 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Support

Slide 35

Slide 35 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Couchbase

Slide 36

Slide 36 text

follow the Hippo trail OneHippo @ Goto Why Couchbase? • Drop-in replacement for memcached • Read/Write-through cache • High throughput • Easy scalability • Schema flexibility • Low latency

Slide 37

Slide 37 text

follow the Hippo trail OneHippo @ Goto Couchbase • Open Source • Document-oriented • Easy Scalable • Consistent High Performance

Slide 38

Slide 38 text

follow the Hippo trail OneHippo @ Goto Performance • Object managed cache • Write Queue to disk • Avoids Cold Cache

Slide 39

Slide 39 text

follow the Hippo trail OneHippo @ Goto Easy scalable • Auto sharding • Cross cluster replication (XDCR) • Master - Master replication

Slide 40

Slide 40 text

follow the Hippo trail OneHippo @ Goto Flexible data model • Native JSON support • Incremental Map Reduce • Gives power to the developer

Slide 41

Slide 41 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto How we run Couchbase @Hippo

Slide 42

Slide 42 text

follow the Hippo trail OneHippo @ Goto Load Balancer Database cluster Hippo Delivery Tier Couchbase cluster •Request log data •Targeting data •Statistics data

Slide 43

Slide 43 text

follow the Hippo trail OneHippo @ Goto Query capabilities • Querying via views • Secondary indexes via views • Views based on Map - Reduce • Lacks some advanced query capabilities

Slide 44

Slide 44 text

follow the Hippo trail OneHippo @ Goto Elasticsearch • Apache Lucene • Designed to be distributed • Schema free • Apache 2 licensed • RESTful API

Slide 45

Slide 45 text

follow the Hippo trail OneHippo @ Goto Added value of ES • Full text search • Faceted search • Geo spatial search • All in (near) real-time

Slide 46

Slide 46 text

follow the Hippo trail OneHippo @ Goto Couchbase Server Cluster Elasticsearch Server Cluster Hippo Delivery Tier Java API Write Read XDCR Couchbase ES Transport plugin Replicating to ES

Slide 47

Slide 47 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Demo time!

Slide 48

Slide 48 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto What’s Next?

Slide 49

Slide 49 text

follow the Hippo trail OneHippo @ Goto Advanced analytics

Slide 50

Slide 50 text

follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Thank you! Questions? [email protected] @jreijn ps. We’re hiring!