Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hippo GetTogether: The architecture behind Hipp...

Hippo GetTogether: The architecture behind Hippos relevance platform

These slides were from my Hippo GetTogether 2013 presentation. During this presentation I went into detail about the architecture behind our high performance relevance platform. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.

Jeroen Reijn

June 21, 2013
Tweet

More Decks by Jeroen Reijn

Other Decks in Technology

Transcript

  1. Building a relevance platform with Couchbase and Elasticsearch Hippo GetTogether,

    21 June 2013 Jeroen Reijn | @jreijn | #hgt2013 Hippo GetTogether 2013 follow the Hippo trail
  2. follow the Hippo trail Hippo GetTogether 2013 About me •

    Architect @ Hippo • DevOps guy • Blogger @ http://blog.jeroenreijn.com
  3. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto

    “The capability of a search engine or function to retrieve data appropriate to a user's needs.” http://www.thefreedictionary.com/relevance
  4. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto

    How we deliver relevant content @Hippo
  5. follow the Hippo trail Hippo GetTogether 2013 Registration Visitor -

    entity making HTTP requests Collector - records data about a visitor or his behavior Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam
  6. follow the Hippo trail Hippo GetTogether 2013 Matching Characteristic -

    a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner"
  7. follow the Hippo trail Hippo GetTogether 2013 What do we

    store? Request log Targeting data Statistics Averages, e.g. how many visitors became which persona
  8. follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery

    Tier Hippo Repository App server XML JSON (X)HTML
  9. follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL

    Matching Fetch content Compose output Request Response
  10. follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL

    Matching Targeting Data Collection Compose output Request Response Fetch content Scoring
  11. follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery

    Tier Hippo Repository App server Hippo Delivery Tier Hippo Repository App server Scaling out
  12. follow the Hippo trail Hippo GetTogether 2013 RDBMS Delivery Tier

    Repository App server Delivery Tier Repository App server Scaling out Targeting Datastore
  13. follow the Hippo trail Hippo GetTogether 2013 Assessment Criteria Maturity

    Data model Consistency model Performance Replication Caching model Query model Monitoring Scalability Reliability Support
  14. follow the Hippo trail Hippo GetTogether 2013 Selection Criteria •

    Performance • Scalability • Schema flexibility • Simplicity • Monitoring • Support
  15. follow the Hippo trail Hippo GetTogether 2013 { "visitorId": "7a1c7e75-8539-40",

    "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": [] } Request log document
  16. follow the Hippo trail Hippo GetTogether 2013 { "geo": {

    "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" } } Visitor document
  17. follow the Hippo trail Hippo GetTogether 2013 Why Couchbase? •

    Drop-in replacement for memcached • Read/Write-through cache • High throughput • Easy scalability • Schema flexibility • Low latency
  18. follow the Hippo trail Hippo GetTogether 2013 Couchbase • Open

    Source • Document-oriented • Easy Scalable • Consistent High Performance • Apache license
  19. follow the Hippo trail Hippo GetTogether 2013 Performance • Object

    managed cache • Write Queue to disk • Avoids Cold Cache
  20. follow the Hippo trail Hippo GetTogether 2013 Easy scalable •

    Auto sharding • Cross cluster replication (XDCR) • Master - Master replication
  21. follow the Hippo trail Hippo GetTogether 2013 Flexible data model

    • Native JSON support • Incremental Map Reduce • Gives power to the developer
  22. follow the Hippo trail Hippo GetTogether 2013 Load Balancer Database

    cluster Hippo Delivery Tier Couchbase cluster •Request log data •Targeting data •Statistics data
  23. follow the Hippo trail Hippo GetTogether 2013 Query capabilities •

    Querying via views • Secondary indexes via views • Views based on Map - Reduce • Lacks some advanced query capabilities
  24. follow the Hippo trail Hippo GetTogether 2013 Elasticsearch • Apache

    Lucene • Designed to be distributed • Schema free • Apache license • RESTful API
  25. follow the Hippo trail Hippo GetTogether 2013 Added value of

    ES • Full text search • Faceted search • Geo spatial search • All in (near) real-time
  26. follow the Hippo trail Hippo GetTogether 2013 Couchbase Server Cluster

    Elasticsearch Server Cluster Hippo Delivery Tier Java API Write Read XDCR Couchbase ES Transport plugin Replicating to ES
  27. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto

    Thank you! Questions? [email protected] | @jreijn ps. We’re hiring!