Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real-time visitor analysis with Couchbase and Elastichsearch

Real-time visitor analysis with Couchbase and Elastichsearch

These slides were from my NoSQL Matters Barcelona 2013 presentation. During this presentation I went into detail about the architecture behind our high performance real-time visitor analysis platform. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for advanced search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.

2c3ad1c6891845c582e0171e2e1753b1?s=128

Jeroen Reijn

November 30, 2013
Tweet

Transcript

  1. Real-time visitor analysis with Couchbase and Elasticsearch Jeroen Reijn |

    @jreijn | #nosql13 follow the Hippo trail
  2. follow the Hippo trail NoSQL Matters 2013 About me Jeroen

    Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com
  3. follow the Hippo trail NoSQL Matters 2013 About Hippo

  4. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    Visitor Analysis
  5. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

  6. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

  7. follow the Hippo trail NoSQL Matters 2013 Journey based Targeting

  8. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    How we analyse visitors @ Hippo
  9. follow the Hippo trail NoSQL Matters 2013 Registration Visitor -

    entity making HTTP requests Collector - records data about a visitor or his behaviour Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam
  10. follow the Hippo trail NoSQL Matters 2013 Matching Characteristic -

    a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner"
  11. follow the Hippo trail NoSQL Matters 2013 What do we

    store? Request log ! Targeting data ! Statistics Averages, e.g. how many visitors became which persona
  12. follow the Hippo trail NoSQL Matters 2013 Real-time analysis

  13. follow the Hippo trail NoSQL Matters 2013 How about YOU?

    • Do you analyse your visitors? • Do you do it ‘real- time’?
  14. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    Architecture
  15. follow the Hippo trail NoSQL Matters 2013 RDBMS Hippo Delivery

    Tier Hippo Repository App server XML JSON (X)HTML
  16. follow the Hippo trail NoSQL Matters 2013 Delivery Tier URL

    Matching Fetch content Compose output Request Response
  17. follow the Hippo trail NoSQL Matters 2013 Delivery Tier URL

    Matching Collect data Compose output Request Response Fetch content Scoring
  18. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    Scaling
  19. follow the Hippo trail NoSQL Matters 2013 RDBMS Hippo Delivery

    Tier Hippo Repository App server Hippo Delivery Tier Hippo Repository App server Scaling out
  20. follow the Hippo trail NoSQL Matters 2013 RDBMS Delivery Tier

    Repository App server Delivery Tier Repository App server Scaling out Targeting Datastore
  21. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    What kind of storage?
  22. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    Writer Single write Datastore Several reads Typical Data Access Pattern
  23. follow the Hippo trail NoSQL Matters 2013 Analytics Data Access

    Pattern Writers Datastore Single read Several writes CMS user
  24. follow the Hippo trail NoSQL Matters 2013 Targeting Data Access

    Pattern Visitors Datastore Single read Several writes Several reads CMS user
  25. follow the Hippo trail NoSQL Matters 2013 Distributed Cache

  26. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    Requirements change!
  27. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    NoSQL ?
  28. follow the Hippo trail NoSQL Matters 2013 Suitable types •

    Key-value store • Document database • Column oriented store
  29. follow the Hippo trail NoSQL Matters 2013 Assessment Criteria Maturity

    Data model Consistency model Performance Replication Caching model Query model Monitoring Scalability Reliability Support
  30. follow the Hippo trail NoSQL Matters 2013 Selection Criteria •

    Performance • Scalability • Schema flexibility • Simplicity
  31. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    Couchbase
  32. follow the Hippo trail NoSQL Matters 2013 Why Couchbase? •

    Drop-in replacement for memcached • Read/Write-through cache • High throughput • Easily scalable • Schema flexibility • Low latency
  33. follow the Hippo trail NoSQL Matters 2013 Couchbase • Open

    Source • Document-oriented • Easy Scalable • Consistent High Performance • Apache licensed
  34. follow the Hippo trail NoSQL Matters 2013 Performance • Object

    managed cache • Write Queue to disk
  35. follow the Hippo trail NoSQL Matters 2013 Easy scalable •

    Auto sharding • Cross cluster replication (XDCR) • Master - Master replication
  36. follow the Hippo trail NoSQL Matters 2013 Flexible data model

    • Native JSON support • Incremental Map Reduce • Gives power to the developer
  37. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    How we run Couchbase @ Hippo
  38. follow the Hippo trail NoSQL Matters 2013 Load Balancer Database

    cluster Hippo Delivery Tier Couchbase cluster •Request log data •Targeting data •Statistics data
  39. follow the Hippo trail NoSQL Matters 2013 Analysis capabilities •

    Querying via views • Secondary indexes via views • Views based on Map - Reduce • Limited ad-hoc query capabilities
  40. follow the Hippo trail NoSQL Matters 2013 Elasticsearch • Apache

    Lucene • Designed to be distributed • Schema free • Apache license • RESTful API
  41. follow the Hippo trail NoSQL Matters 2013 Added value •

    Unstructured search • Structured search • Faceted search • Geo spatial search • Combinate all • All in (near) real-time
  42. follow the Hippo trail NoSQL Matters 2013 Couchbase Server Cluster

    Elasticsearch Server Cluster Hippo Delivery Tier Java API Write Read Couchbase Transport plugin Replication XDCR Read / Query
  43. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    What’s Next?
  44. follow the Hippo trail NoSQL Matters 2013 Advanced analytics

  45. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    { Demo }
  46. follow the Hippo trail NoSQL Matters 2013 OneHippo @ Goto

    ! Thanks! ! j.reijn@onehippo.com @jreijn www.onehippo.com