Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Extended Custom Scripting at Groupon

Dd9d954997353b37b4c2684f478192d3?s=47 Elastic Co
November 19, 2015

Extended Custom Scripting at Groupon

Custom Scripting is very powerful, but using it requires every developer to be an Elasticsearch expert. By creating a Scoring API, Groupon is able to let developers write custom scoring functions without Elasticsearch knowledge.

Brian Humphrey | Elastic{ON} Tour Chicago | November 19, 2015


Elastic Co

November 19, 2015

More Decks by Elastic Co

Other Decks in Technology


  1. Brian Humphrey, Groupon, Nov. 19, 2015 bhump@groupon.com 1 Extending Custom

  2. 2

  3. 3

  4. Overview • Elasticsearch @ Groupon ▪ The switch to Elasticsearch

    • Elasticsearch Native Scripts ▪ Operation and Limitations • The Scoring API ▪ Machine learning models inside custom scripts • Request Data ▪ Moving data between custom scoring and data retrieval scripts 4
  5. Elasticsearch @ Groupon The Past: • Unsharded Solr + API

    Frontend • Selection in Solr • Ranking in API Frontend ▪ All selected deals transferred to frontend 5 Today: • Elasticsearch + new API frontend • Selection and Ranking in ES • Frontend does mixing and display
  6. Elasticsearch @ Groupon 6 The  Past Today

  7. Elasticsearch @ Groupon • Relevance Infrastructure ▪ Ranking deals for

    web requests, mobile, and e-mail ▪ 3 Production Clusters, 266 Nodes ▪ Largest cluster – 4k requests/second, 65M deals ranked/second ▪ Data Size – 7GB, high replication 7
  8. Elasticsearch @ Groupon • The Results ▪ Went from 5%

    fallback rate to <.1% fallback rate ▪ Latency decreased from 2 seconds to <1 second ▪ Ranking more deals per user! ▪ Run email sends using same code • Elasticsearch Load ▪ Low disk utilization – Small data set, caching everywhere ▪ Low memory footprint ▪ High CPU usage – CPU bound on 40-core server 8
  9. Elasticsearch @ Groupon • Why the odd load? ▪ Extensive

    use of native scripts for ranking ▪ Ranking is expensive, running machine learning models on every request ▪ Compute features based on current user and current deal ▪ Scales with inventory o ~16000 deals ranked per request (And growing!) 9
  10. Elasticsearch Native Scripts • Different ways to access document data

    (doc(), source(), fields()) ▪ Requires ES knowledge to know which to use ▪ Requires knowledge of schema and analysis ▪ Easy to make a mistake • Boilerplate code required for even simplest functions ▪ Reading common data, exception handling, metrics, etc. • No object model for documents • Scoring restricted to returning a number ▪ No scoring debug information 10
  11. Groupon Ranking Models • Our Machine Learning models use feature

    vectors to make a score • Feature vectors hard to fit into ES model raw ▪ Need multiple versions for experiments ▪ Some require per-request data (User data, location, time) ▪ Need fast access, but never searched 11
  12. Scoring API – The Design • Infrastructure details should be

    common and removed from models ▪ Caching, returning debug information, error handling, metrics ▪ Models should be easy to read and understand • Wrap Elasticsearch data access, produce feature vectors ▪ Reduce the problem to one developers are already set up for ▪ Developers work decoupled from Elasticsearch APIs ▪ Ranking models are portable o Move to a new architecture == write new Scoring API implementation o Test performance of models and infrastructure separately 12
  13. Scoring API Scorable • Anything used to produce a Score

    Scoring Function • score() method ▪ Takes Scorable ▪ Makes Score Score • Number used for sorting • Any additional data ▪ Confidence, Scoring Debug, etc. 13 • Examples: ▪ Empty Scorable – No Data o Constant or Random Scoring Function ▪ Deal Scorable – Deal Data o Overall Purchase Probability (popularity) ▪ User Deal Scorable – Deal + User Data o Customized Purchase Probability, including location, purchase history, time of day
  14. Scoring API - Interfaces public interface Scorable { } public

    interface ScoringFunction<S extends Scorable> { public Score score(S sourceScorable); } 14
  15. Scoring API - Interfaces public interface Score { public double

    asDouble(); } 15
  16. Scoring API - Diagram 16

  17. Scoring API Scoring Script • Calls Provider to make Scorable

    ▪ Providers access script params, doc(), and source() data, can have caching • Calls ScoringFunction, producing a Score ▪ Score is stored in Request Data and returned using Script Fields o Includes all extra data on the Score o Returns more than a number! • All ES specifics are taken care of by the Provider and the Scoring Script ▪ ScoringFunction uses Object model! 17
  18. Scoring API - Request Data • Access data specific to

    a particular document in the query ▪ Pass Score object back via Script Fields o Extra data from the ScoringFunction! ▪ Mark docs with labels o Filter/Query labels accessible inside scripts ▪ Child Scores when scoring the parent o Score all children, use these Scores in parent ScoringFunction 18
  19. Next Steps • Open Source • Release Scoring API as

    an Elasticsearch Plugin • Dynamic models • Using Predictive Model Markup Language (PMML) and a special ScoringFunction, update models without cluster restart 19 • Multi-stage scoring • Partial computation • Fast, coarse ranking first, then apply finer ranking to top deals, reusing work from previous steps
  20. Brian Humphrey, Groupon, Nov. 19, 2015 bhump@groupon.com 20 Extending Custom

  21. www.elastic.co 21