Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} 2018 - Nuts and Bolts of Product Se...

Elastic Co
March 01, 2018

Elastic{ON} 2018 - Nuts and Bolts of Product Search - How Fastenal Refreshes Large Indexes Nightly

Elastic Co

March 01, 2018
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Fastenal February 28, 2018 The Nuts and Bolts of Product

    Search How Fastenal Refreshes Large Indexes Nightly Kelly Sauke, Enterprise Architect Nicole Albee, DevOps Engineer Alarka Sanyal, Software Developer
  2. Topics • Who is Fastenal? • Infrastructure • Elasticsearch Cluster

    • Data Structure • Elasticsearch Indexing
  3. LOCAL SERVICE U S S T O R E L

    O C A T I O N S 3 0 M I N U T E D R I V E 6 0 M I N U T E D R I V E 2,500+ Store Locations ~19k Employees 14 Distribution Hubs 7,000+ Delivery Vehicles ~500 Onsite Locations
  4. INDUSTRIAL SERVICES Calibration & Repair Hose Fabrication Hoist Repair Gas

    Detection Repair Custom Packaging Custom Chain Sling Bandsaw Blades Tool Repair Tool & Cutter Regrind Stud Cutting Special Assemblies Lifting & Rigging V A L U E - A D D W I T H I N T H E S U P P L Y C H A I N
  5. Elasticsearch Origin Story (2014) February Loaded Data for first time

    Product, Customer Warehouse (v1.0.0) Initial Discussions With First Developer March First Development Cluster Online April Official Product Search Project Charter Fastenal Becomes Elasticsearch Inc. Support Customer (v.1.1) Built Ugly POC
  6. Use Cases Within Fastenal • Product Search • What we're

    talking about today • Enterprise Search • Vendor • Customer • Vending • Operations – Telemetry • Logs • Performance Analytics • Product Search telemetry data
  7. Future • Elastic software upgrade • Dedicated node roles •

    Masters • Data • Coordinating • Dedicated hardware • Direct attached SSD • Leverage other software from Elastic • Machine Learning • Graph • APM
  8. Category index Ø Product classification tree Ø Category Image Ø

    Category Description Ø Metadata related to servicing search requests Ø No child docs Root Fasteners Bolts Eye Bolts U-Bolts Screws Wood Screws Lag Screws Safety Protective Garments Sleeves Hand Protection Glove Bags Level One Level Two Level Three
  9. Catalog Index Ø During the building of the product index

    the catalog index is used to calculate which products belong to which catalogs based on the business rules Ø Initial Structure Ø 1 parent type doc Ø 1 child doc Ø 1 grand child Ø Challenges with grandchild docs Ø Inefficient Ø Cause: Slow search response time Ø Effect: Slow indexing rate of product index Ø Complex query structure Ø Cause: Nested "has_child" query Ø Effect: Increased dev and test time Ø Current Structure Ø 1 parent type doc Ø 1 child doc
  10. What is Gearman? Ø According to gearman.org, “Gearman provides a

    generic application framework to farm out work to other machines or processes that are better suited to do the work.” Ø It has three main parts: Ø Client Ø Job server Ø Worker Ref: http://gearman.org/
  11. Indexing Data Server 1 Server 2 Server 3 Worker type

    1 Worker type 2 Worker type 3 Worker type 1 queue Worker type 2 queue Worker type 3 queue Search product category catalog … Logging
  12. Indexing Data Server 1 Server 2 Server 3 Worker type

    1 Worker type 2 Worker type 3 (PHP Code) Index builder Worker type 1 queue Worker type 2 queue Worker type 3 queue Search product category catalog … Logging
  13. Search product category catalog … Logging Indexing Data Server 1

    Server 2 Server 3 (PHP Code) Query databases Stream data Index builder Enhance data from Elasticsearch Worker type 1 Worker type 2 Worker type 3 Worker type 1 queue Worker type 2 queue Worker type 3 queue
  14. Search product category catalog … Logging Indexing Data Server 1

    Server 2 Server 3 Query databases Stream data Enhance data from Elasticsearch Worker type 1 Worker type 2 Worker type 3 Worker type 1 queue Worker type 2 queue Worker type 3 queue (PHP Code) Index builder
  15. Search product category catalog … Logging Indexing Data Server 1

    Server 2 Server 3 Worker type 1 Worker type 2 Worker type 3 Worker type 1 queue Worker type 2 queue Worker type 3 queue (PHP Code) Index builder Query databases Stream data Enhance data from Elasticsearch
  16. Search product category catalog … Logging Indexing Data Server 1

    Server 2 Server 3 Index search data into Elasticsearch Index log data into Elasticsearch Worker type 1 Worker type 2 Worker type 3 Worker type 1 queue Worker type 2 queue Worker type 3 queue (PHP Code) Index builder Query databases Stream data Enhance data from Elasticsearch
  17. Search product category catalog … Logging Indexing Data Server 1

    Server 2 Server 3 Index search data into Elasticsearch Enhance data from Elasticsearch Index log data into Elasticsearch Worker type 1 Worker type 2 Worker type 3 Worker type 1 queue Worker type 2 queue Worker type 3 queue (PHP Code) Index builder Query databases Stream data Enhance data from Elasticsearch
  18. Search product category catalog … Logging Indexing Data Server 1

    Server 2 Server 3 Index search data into Elasticsearch Enhance data from Elasticsearch Index log data into Elasticsearch Worker type 1 Worker type 2 Worker type 3 Worker type 1 queue Worker type 2 queue Worker type 3 queue (PHP Code) Index builder Query databases Stream data Enhance data from Elasticsearch
  19. Search product category catalog … Logging Indexing Data Server 1

    Server 2 Server 3 Index search data into Elasticsearch Enhance data from Elasticsearch Index log data into Elasticsearch Worker type 1 Worker type 2 Worker type 3 Worker type 1 queue Worker type 2 queue Worker type 3 queue (PHP Code) Index builder Query databases Stream data Enhance data from Elasticsearch
  20. Product index helper index Initial create SKUs Single update by

    scrolling helper index 10+ updates Sleep Time Initial create SKUs Initial create SKUs Single update by scrolling helper index helper index Time Time V1: 24+ hours V2: 18+ hours Current: < 12 hours
  21. Additional Insights Ø Suggester limitations Ø "index" : "no" (2.X)

    to save disk space Ø Elastic Data modeling Future Ø Avoid updates for full index build Ø Really large docs Ø Get rid of parent-child/nested docs Ø Over-tokenize fields