Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Seek - Elasticsearch: From Hack to Production

Seek - Elasticsearch: From Hack to Production

Brett Christensen, Reza Yousefzadeh and Nivantha Mandawala from Seek.com.au talk about migrating from a legacy search system to Elasticsearch for the Seek.com.au 'Talent Search' service.

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

April 28, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Elasticsearch: From Hack to Production

  2. About us • Brett Christensen ◦ https://www.linkedin.com/in/brett-christensen-a73a4010 ◦ @brett_c •

    Reza Yousefzadeh ◦ https://au.linkedin.com/in/reza-yousefzadeh-7052b939 ◦ @reza_yz • Nivantha Mandawala ◦ https://au.linkedin.com/in/nivantha ◦ @nivazone
  3. Agenda • Our journey • Continuous Delivery • SEEK Implementation

    of Elasticsearch • Questions
  4. SEEK Search Team Mission: Provide world class search and matching

    • Highly customised search algorithms • Highly performant indexing/searching • World class consumer experience
  5. SEEK Search Team Continuous Delivery as a culture • Cloud

    first • Faster feedback cycles • Easy to diagnose/resolve issues • Zero downtime deployments • Speed to market++ Credit to Łukasz Górnicki / @derberq
  6. The Products

  7. Job Search

  8. Job Search

  9. Talent Search

  10. Talent Search

  11. The problem: Hitting roadblocks

  12. The problem: Hitting roadblocks • Proprietary search engine • Complex

    queries were too slow • Not horizontally scalable • Not cloud ready • Limited set of API’s, limited SDKs • Very small community
  13. Pick a Target Why Talent Search? • Decoupled/Continuously Delivered •

    All the components already in Cloud • Less high profile compared to jobsearch
  14. Pick a Target Why Elasticsearch? • Free and easily available

    • Doco/community++ • It’s Elastic! • Highly customizable • High team engagement
  15. Spiking it out: A hackathon project • 3 days for

    vertical slice of functionality • No experience with Elasticsearch as core search • Started out with Amazon hosted service
  16. Hackathon learnings about Elastic • Good documentation • Large community

    • Feature rich • Customisable • Highly performant • High confidence in estimating switching costs
  17. Next step: Get it on the roadmap Obstacles • Issues

    with current system known but... • Previous experience with migrating painful
  18. Next step: Get it on the roadmap Approach • Start

    small and demonstrate value quickly • Get people on board early • Build it fast and show it off.... • ...but not just a hack • Test with a small percentage of traffic
  19. Outcome • (Most) of Talent Search is now powered by

    Elasticsearch! • Job Search to follow...
  20. Continuous Delivery “In software, when something is painful, the way

    to reduce the pain is to do it more frequently, not less.” ― David Farley, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation
  21. Continuous Delivery • Architecture • Immutable Infrastructure • Automated Testing

    • Deployment Pipeline • Technologies
  22. Search architecture

  23. Old vs. New side by side...

  24. Immutable Infrastructure • No updates to running environments • Infrastructure

    as code • Blue/Green Deployments
  25. Immutable Infrastructure • No updates to running environments • Infrastructure

    as code • Blue/Green Deployments
  26. Automated Testing

  27. Automated Testing • Smoke test • Stop the pipeline on

    failed tests • Performance test before switching
  28. Deployment Pipeline

  29. Immediate feedback

  30. Technologies

  31. SEEK Implementation of Elasticsearch

  32. Elasticsearch - High Level View

  33. Going custom...

  34. Elasticsearch - Indexing Pipeline

  35. Elasticsearch - Indexing Pipeline

  36. Elasticsearch - Indexing Pipeline

  37. Elasticsearch - Indexing Pipeline

  38. Elasticsearch - Indexing Pipeline

  39. Elasticsearch - Indexing Pipeline

  40. Elasticsearch - Indexing Pipeline

  41. Elasticsearch - Customizing Similarity Algorithm { "settings": { "similarity": {

    "algorithm_1_bm25": { "type": "BM25", "k1": 0.2, "b": 0.35 }, "algorithm_2_bm25": { "type": "BM25", "k1": 0.001, "b": 0 } } } }
  42. { "mappings": { "profile": { "properties": { "jobTitle": { "type":

    "string", "similarity": "algorithm_1_bm25" }, "jobDescription": { "type": "string", "similarity": "algorithm_2_bm25" } } } } } Elasticsearch - Customizing Similarity Algorithm...
  43. Elasticsearch - Tuning Relevance • Challenging and time consuming •

    Quick feedback cycle • Weightings/Boosting • Avoid re-indexing when we can
  44. Elasticsearch - Tuning Performance • Memory intensive, Java heap size

    • Enable replicas after re-indexing • SSD instance storage over EBS • HTTP Compression • Cluster rebuilt/re-indexed in 2 hours!
  45. Prod vs. Dark, Champ vs. Challenger • Testing/tuning relevance in

    dummy data is quite hard • Why not create “dark cluster” with prod data? • Site override, Split traffic 50/50...
  46. Summary • Elasticsearch is easy to work with, customizable and

    feature rich • Continuous delivery made a huge difference • Sell your idea to fellow team members, gather momentum • Start small but build it as if you are going to productionize it
  47. Keep in touch... • Brett Christensen ◦ https://www.linkedin.com/in/brett-christensen-a73a4010 ◦ @brett_c

    • Reza Yousefzadeh ◦ https://au.linkedin.com/in/reza-yousefzadeh-7052b939 ◦ @reza_yz • Nivantha Mandawala ◦ https://au.linkedin.com/in/nivantha ◦ @nivazone