Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} 2018 - Replacing Legacy Product Search with Elasticsearch at IEEE GlobalSpec

Elastic Co
March 01, 2018

Elastic{ON} 2018 - Replacing Legacy Product Search with Elasticsearch at IEEE GlobalSpec

Elastic Co

March 01, 2018
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. IEEE GlobalSpec 2018 February 28 Replacing Legacy Product Search with

    Elasticsearch at IEEE GlobalSpec Kathleen DeRusso, Principal Software Engineer Trevor Gray, Senior Software Engineer 1
  2. Agenda 2 1 Introduction to IEEE GlobalSpec 2 Our journey

    to the Elastic Stack 3 Architecture 4 Three search challenges and how we solved them 5 The next steps in our journey
  3. • A community built for engineers, by engineers • Delivering

    trusted, expert engineering content, information, insight, tools and more • Nearly 9 million registered engineers and technical professionals • Business model – providing comprehensive media solutions that connect manufacturers, distributors and service providers with our audience Introduction to IEEE GlobalSpec 4 www.globalspec.com
  4. • Text search over on-disk Lucene index • Spec/facet data

    stored in memory • Additional separate search content on site: • In-memory Lucene indices • Full-text SQL search • Solr 6 What we had before A home-grown Java search application
  5. Needed to scale Opportunity to re-write search from the ground

    up Wanted faster search Wanted simpler architecture The Next Evolution 7 Needed to scale Wanted faster search Wanted simpler architecture
  6. Open source • Actively supported • Growing community • Familiar

    with Lucene Open source • Actively supported • Growing community • Familiar with Lucene Ease of use • Scalable • Auto-recovery • Shard rebalancing • Easy to add new content Ease of use • Scalable • Auto-recovery • Shard rebalancing • Easy to add new content Features • Custom features available out of the box • Real-time log analytics • More like this How we came to Elastic 8
  7. • SME feedback • A/B testing • Create the Gateway

    process • Build data for both systems at once • SME feedback • A/B testing • Create the Gateway process • Build data for both systems at once • Flip the switch! Migration Process 9
  8. Logstash Nodes (2) IEEE GlobalSpec Search Architecture 11 Elasticsearch Nodes

    (2) Kibana Instances (1) Bulk Artemis Search Gateway Instances (3) Log Files Web Server Instances (4) Log Files Backend Datastores Elasticsearch Master Nodes (3) Data Nodes (4) Search Cluster
  9. Logstash Nodes (2) IEEE GlobalSpec Search Architecture 12 Elasticsearch Nodes

    (2) Kibana Instances (1) Bulk Artemis Search Gateway Instances (3) Log Files Web Server Instances (4) Log Files Backend Datastores Elasticsearch Master Nodes (3) Data Nodes (4) Search Cluster
  10. Logstash Nodes (2) IEEE GlobalSpec Search Architecture 13 Elasticsearch Nodes

    (2) Kibana Instances (1) Artemis Search Gateway Instances (3) Log Files Web Server Instances (4) Log Files Backend Datastores Elasticsearch Master Nodes (3) Data Nodes (4) Search Cluster Bulk
  11. Logstash Nodes (2) IEEE GlobalSpec Search Architecture 14 Elasticsearch Nodes

    (2) Kibana Instances (1) Bulk Artemis Search Gateway Instances (3) Log Files Web Server Instances (4) Log Files Backend Datastores Elasticsearch Master Nodes (3) Data Nodes (4) Search Cluster
  12. Logstash Nodes (2) IEEE GlobalSpec Search Architecture 15 Elasticsearch Nodes

    (2) Kibana Instances (1) Bulk Artemis Search Gateway Instances (3) Log Files Web Server Instances (4) Log Files Backend Datastores Elasticsearch Master Nodes (3) Data Nodes (4) Search Cluster
  13. Logstash Nodes (2) IEEE GlobalSpec Search Architecture 16 Elasticsearch Nodes

    (2) Kibana Instances (1) Bulk Artemis Search Gateway Instances (3) Log Files Web Server Instances (4) Log Files Backend Datastores Elasticsearch Master Nodes (3) Data Nodes (4) Search Cluster
  14. • Use case: product search • Suppliers provide catalogs of

    their products • Showcase best result from each catalog • Two goals: • UX goal: display the most relevant results • Business goal: showcase breadth of index 18 Product Search on GlobalSpec.com
  15. • Solution: diversified sampler aggregation + top hits aggregation •

    Sampler aggregation filters to a sample of top scoring documents from each shard • Top hits creates search result output • Shard routing matters • Use with care – can be slow! 19 Product search diversification { { "query": { ... }, "aggregations": { “diversifiedResults": { "diversified_sampler": { "field": “supplierId", "shard_size": 1000, "max_docs_per_value": 1 }, "aggregations": { “topDiversifiedDocuments": { "top_hits": { "from": 0, "size": 10 } ... }
  16. air control [valve] air [traffic] control air [temperature] control air

    [traffic] control air [temperature] control air control [valve] air control [industries, inc.] Ambiguous Queries: Example 21
  17. Understand • Create a ranked list of possible interpretations of

    the user’s query Understand Understand • Create a ranked list of possible interpretations of the user’s query Understand Plan • Combine all the interpretations of a user’s query Plan • Combine all the interpretations of a user’s query Translate • Translate the plan into Elasticsearch query DSL Ambiguous Queries: Strategy 22
  18. Varying index sizes 24 Artemis Search Gateway Alias Very small

    index (type: B) Very large index (type: A)
  19. • IDF = Inverse Document Frequency • Plays a significant

    role in BM25 • Common terms contribute significantly less to the score • The same term might be common in a large index, and uncommon in a small index Why we care about IDF 25 Doc Frequency
  20. Option 1 • Merge into a single index with a

    custom type field Option 1 • Merge into a single index with a custom type field Option 2 • Change search_type to calculate global IDF values Consistent IDF Options 26
  21. search_type = query_then_fetch 27 Artemis Search Gateway Coordinating Node Index

    Shards Website Query Fetch User Query Search Results Elastic Query Elastic Response • 2 round trips • Per-shard IDF
  22. Elastic Response search_type = dfs_query_then_fetch 28 Artemis Search Gateway Coordinating

    Node Index Shards Website DFS Query + Global IDF Fetch User Query Search Results Elastic Query • 3 round trips • Global IDF
  23. • We took our search to the next level using

    built-in functionality • Use aggregations to power our diversification business rules • Understand user queries, send targeted, complex queries to Elastic • Leverage DFS query-then-fetch to search indices of varying sizes • Realized ROI • Search performance improved 2x! • Improved user engagement: time on site, clicks, bails • More productive developers 30 Success!
  24. • We didn’t stop! Migrated additional services to Elastic •

    Single unified Elastic search backend • Server consolidation -> Lower bottom line costs • Better access to logs and metrics • Debugging • Report generation • Rolling out changes in Production • Recommendations and personalization 31 We’re just getting started…