Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Powering a BI Application with Elasticsearch at Yodle

Elastic Co
October 19, 2016

Powering a BI Application with Elasticsearch at Yodle

Yodle recently transitioned from using Hadoop/Scalding to Elasticsearch to power a tailored BI product for franchises. This talk will detail the impact this transition has had on the product, our customers, and engineer efficiency.
Mark Drago | Director of Engineering | Yodle

Elastic Co

October 19, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Mark Drago, Director of Engineering Powering a BI Application with

    Elasticsearch Yodle, a web.com company October 19th, 2016
  2. Supporting local business since 2005. 2 Yodle helps more than

    50,000 local business owners in 250 different industries find and keep customers simply and profitably. a company
  3. 3 Document Search •  Contacts in CRM •  Email Marketing

    List Criteria Elastic Stack (beats, logstash, kibana, elasticsearch) •  Centralized Log Store •  225 Microservices •  HTTP/JSON •  Docker Containers •  Marathon / Mesos •  Many non-microservice apps •  6 Environments •  dev, multiple QAs, production BI Product •  Website Traffic •  Marketing Data (SEM) •  Phone Call Data •  Search Engine Ranking •  Online Reviews •  etc. @ Yodle 2013 2014 2015
  4. 4

  5. 16 Sample Aggregation Franchise ID Business ID Date Rating F1

    B1 September 15 Good F2 B2 September 15 Good F1 B3 October 12 Good F1 B1 October 12 Good F2 B2 October 19 Good F2 B2 October 19 Good Franchise ID Month Good Calls F1 September 1 F2 September 1 F1 October 2 F2 October 2
  6. 19 Challenges •  Need ability to backfill historical data at

    the start •  Need ability to re-backfill as we add features or fix bugs •  Must handle fresh data coming in every day •  Must handle historical data changing
  7. 23 Sample Aggregation Franchise ID Business ID Date Rating F1

    B1 September 15 Good F2 B2 September 15 Good F1 B3 October 12 Good F1 B1 October 12 Good F2 B2 October 19 Good F2 B2 October 19 Good Franchise ID Month Good Calls F1 September 1 F2 September 1 F1 October 2 F2 October 2 Problem Problem
  8. 36 5 nodes Each node: -  m4.2xlarge -  8 cores

    -  32G ram (½ to ES, ½ to page cache) -  EBS backed storage (general SSD) Data size: -  1TB (2 replicas -> 3TB) -  Largest index: 400G (replicas -> 1.2T) -  Index with most docs: 1.6B Elastic setup: -  Version 2.4.1 (excited about 5.x) -  Instant aggregations, Painless scripting, Lucene 6.2 performance improvements -  Use doc values almost exclusively (non analyzed strings / ints) -  Single availability-zone for now -  Novel backup/restore process to get prod data to test environment (see our blog)