Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reducing customer churn

MunichDataGeeks
September 08, 2016
260

Reducing customer churn

In today‘s dynamic world turnover of customers is high, and companies spend more and more money to win new clients. However, it is usually much more sustainable and rewarding to keep the hard-won customers, rather than chase the new ones. B2C industries, lead by telco and retail, are already implementing data-driven solutions for customer retention. However, in the B2B context, customer retention still remains out of the focus of data science. In this talk, we will present a newly developed and implemented data-driven solution to track customer churn and address potential threats within Siemens client base worldwide. We will go through the different aspects of a real world data initiative from the setup, data acquisition and management, to modeling and delivering the analytical results to the end user.

MunichDataGeeks

September 08, 2016
Tweet

More Decks by MunichDataGeeks

Transcript

  1. Controlling and Finance Audit Reducing customer churn Andrey Sereda, Data

    Analytics practice Restricted © Siemens AG 2016 Restricted
  2. Restricted © Siemens AG 2016 08.09.2016 Page 2 Andrey Sereda

    / CF A DA Siemens generates lots of data through IoT – It is utilized by our divisions to maintain installed devices and develop better products … Siemens productive IoT applications (examples)
  3. Restricted © Siemens AG 2016 08.09.2016 Page 3 Andrey Sereda

    / CF A DA … another big chunk of data is generated by all business activities of the company – The data is stored centrally in a data lake Siemens productive IoT applications (examples) Siemens business data lake Central data lake powered by SAP HANA 100+ local SAP ERP systems Custom connectors Icons from Flaticon.com
  4. Restricted © Siemens AG 2016 08.09.2016 Page 4 Andrey Sereda

    / CF A DA Working on a data-driven solutions for our customers we always follow a clearly defined and aligned approach Smart-Data- Cycle 1 Understand the problem, scope and build hypotheses Ask the right questions 2 Measure elements of the business case / hypotheses Extract/use the right data – and only this data 3 Analyze data Apply various appropriate algorithms (pluralistic modeling) and derive the best solution 4 Test continuously Establish ongoing monitoring of the solution quality and optimization measures 5 Translate analyses results into business impact Find the best way of the implementation of the analyses results to create tangible impact Measure, reflect, ask new questions
  5. Restricted © Siemens AG 2016 08.09.2016 Page 5 Andrey Sereda

    / CF A DA Our vision of customer churn has been changed during the setup phase – Churn formal definition is a cornerstone of the modeling  Customer churns when either  His contract is not renewed, or  We do not sell to him for XX months  When we decide internally, that the customer has been lost, we go out there and do whatever it takes to get him back Before we initiated the project…
  6. Restricted © Siemens AG 2016 08.09.2016 Page 6 Andrey Sereda

    / CF A DA Our vision of customer churn has been changed during the setup phase – Churn formal definition is a cornerstone of the modeling  Siemens is HUGE, we will never have a uniform churn definition  Instead: We forecast order placement for each customer in a short term (3 to 6 months)  Whenever a particular customer deviates from the predicted pattern, let’s do something about that  And BTW, it’s OK if we are not accurate all the time, we shell get the majority right  Customer churns when either  His contract is not renewed, or  We do not sell to him for XX months  When we decide internally, that the customer has been lost, we go out there and do whatever it takes to get him back …and after we carefully thought it through Before we initiated the project…
  7. Restricted © Siemens AG 2016 08.09.2016 Page 7 Andrey Sereda

    / CF A DA Requirements for the end product are defined by the users and their demands for day-to-day business tasks  Big picture view (depending on the position from hundreds to tens of thousands of customers)  Reports on customer dynamics in the past and future, tailored to the Siemens internal reporting periods Management  Detailed view (usually working with ten to hundred customers)  List of customers to be approached today Sales force End goal for the App  Provide both prospectives combining retrospective view (BI) and insights into the short- and mid-term future (BA)  Allow real time filtering and aggregation of details  BUT: No need for real time forecasts / data analysis
  8. Restricted © Siemens AG 2016 08.09.2016 Page 8 Andrey Sereda

    / CF A DA With the problem defined, we could start looking into the data collected in the HANA data lake Data insights / data management Tools: • HANA cluster Insights: • Transactional variables with high info on order forecast • Customer master data – only limited value • Automated data management pipeline is crucial already on this step
  9. Restricted © Siemens AG 2016 08.09.2016 Page 9 Andrey Sereda

    / CF A DA With SAP HANA we had a great solution for data processing, but not for the machine learning part of the envisioned application  @Siemens, we are using SAP HANA based solution for the data lake with all business-related data  SAP offers native integration of standard R (version 2.15) as a separate server with only one thread processing
  10. Restricted © Siemens AG 2016 08.09.2016 Page 10 Andrey Sereda

    / CF A DA We could have scaled the R server, but this does not address the one thread problem Option 1: Scale vertically  Install the server on a better hardware with more RAM  Very limited scalability, no parallel computations Icons from Flaticon.com  @Siemens, we are using SAP HANA based solution for the data lake with all business-related data  SAP offers native integration of standard R (version 2.15) as a separate server with only one thread processing
  11. Restricted © Siemens AG 2016 08.09.2016 Page 11 Andrey Sereda

    / CF A DA A cluster or R instances / servers allows for parallel processing – However special skills required Option 1: Scale vertically  Install the server on a better hardware with more RAM  Very limited scalability, no parallel computations Option 2: Scale horizontally  Use a cluster of R servers  Better scalability, but special skills & packages needed Icons from Flaticon.com  @Siemens, we are using SAP HANA based solution for the data lake with all business-related data  SAP offers native integration of standard R (version 2.15) as a separate server with only one thread processing
  12. Restricted © Siemens AG 2016 08.09.2016 Page 12 Andrey Sereda

    / CF A DA We have decided in favor of scale out option – Use R as an interface to work with a distributed framework from H2O Option 1: Scale vertically  Install the server on a better hardware with more RAM  Very limited scalability, no parallel computations Option 2: Scale horizontally  Use a cluster of R servers  Better scalability, but special skills & packages needed Option 3: Scale out  Use R as an interface, calculate on another software (we use H2O framework to scale and distribute)  Very good scalability, no special skills needed Icons from Flaticon.com  @Siemens, we are using SAP HANA based solution for the data lake with all business-related data  SAP offers native integration of standard R (version 2.15) as a separate server with only one thread processing
  13. Restricted © Siemens AG 2016 08.09.2016 Page 13 Andrey Sereda

    / CF A DA With the clear idea of productive infrastructure in mind, we had all instruments to proceed with data mining Data insights / data management Modeling / proof-of-concept Tools: • HANA cluster Insights: • Transactional variables with high info on order forecast • Customer master data – only limited value • Automated data management pipeline is crucial already on this step Tools: • Local instance of Microsoft R and H2O Insights: • Random hyper parameter search rocks • Use metric tailored to the needs of the end user to decide on the best model • GBM provides best results, followed by RF and Deep Neural Networks
  14. Restricted © Siemens AG 2016 08.09.2016 Page 14 Andrey Sereda

    / CF A DA In two months, we have undergone a way from a vision to the end-user ready pilot product Data insights / data management Modeling / proof-of-concept Production Tools: • HANA cluster Insights: • Transactional variables with high info on order forecast • Customer master data – only limited value • Automated data management pipeline is crucial already on this step Tools: • Local instance of Microsoft R and H2O Insights: • Random hyper parameter search rocks • Use metric tailored to the needs of the end user to decide on the best model • GBM provides best results, followed by RF and Deep Neural Networks Tools: • Fully automated bundle HANA & R & H2O in the Siemens data center Insights: • Simplicity is the king – No ensembles if possible, models should be simple • Scalability is the queen – Parallel processing and full automation to achieve highest speed possible
  15. Restricted © Siemens AG 2016 08.09.2016 Page 15 Andrey Sereda

    / CF A DA And here is what it looks like in the production Raw data from SAP systems Data preparation procedures Generated features Modeling Per customer order score Chart data Final model as R / POJO object QV report • Current solution to deliver app to the end customer • Can be replaced with SAP UI5, thus hosting complete application inside one platform • All analytics delivered inside SAP HANA platform Icons from Flaticon.com
  16. Restricted © Siemens AG 2016 08.09.2016 Page 19 Andrey Sereda

    / CF A DA Dry results of data analyses are translated into the business language – Business user usually does not speak the language of statistics Order placement probability, as of current period Order placement probability, predicted two periods ago Step 1 Estimate order placement probability for all customers for two points in time: current period, and current period minus forecast horizon Step 2 Capture dynamics of the order probability, compare it to the actual customer behavior Step 3 Assign all customers to one of the quadrants of the matrix, decide on the measures
  17. Restricted © Siemens AG 2016 08.09.2016 Page 20 Andrey Sereda

    / CF A DA We split all customers into four groups based on the predicted probability to place an order and their actual purchasing behavior Order placement probability, as of current period Order placement probability, predicted two periods ago • “Hidden chances”: rising probability of order shows, that the customer is about to place an order • “Threats”: Decreasing probability of order placement, coupled with the actual buying behavior (no order in the last two months) points at deviation from the order placement pattern
  18. Restricted © Siemens AG 2016 08.09.2016 Page 21 Andrey Sereda

    / CF A DA Recap: Working on a data-driven solutions for our customers we always follow a clearly defined and aligned approach Smart-Data- Cycle 1 Understand the problem, scope and build hypotheses Ask the right questions 2 Measure elements of the business case / hypotheses Extract/use the right data – and only this data 3 Analyze data Apply various appropriate algorithms (pluralistic modeling) and derive the best solution 4 Test continuously Establish ongoing monitoring of the solution quality and optimization measures 5 Translate analyses results into business impact Find the best way of the implementation of the analyses results to create tangible impact Measure, reflect, ask new questions