Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Insight Project

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for mxie mxie
February 20, 2017

Insight Project

Insight project: High value customer profile extration

Avatar for mxie

mxie

February 20, 2017
Tweet

Other Decks in Science

Transcript

  1. Data • 3.5 million customers • Transac6on, geoloca6on Customer • 300+ stores • Annual

    sales, geoloca6on Store • 50K neighborhoods • Demographics, geoloca6on Neighborhoods
  2. Data • 3.5 million customers • Transac6on, geoloca6on Customer • 300+ stores • Annual

    sales, geoloca6on Store • 50K neighborhoods • Demographics, geoloca6on Neighborhoods
  3. Data • 3.5 million customers • Transac6on, geoloca6on Customer • 300+ stores • Annual

    sales, geoloca6on Store • 50 K neighborhoods • Demographics, geoloca6on Neighborhoods Valuable customers
  4. The Metric: Customer AKrac6on Score How a&racted customers are to

    a store Customer Attraction Score = Store Sales Nearby Population
  5. Popula6on Size 4 4 1 3 5 1 18 Neighborhoods

    Customer Attraction Score = Store Sales Nearby Population
  6. Data loaded in PostgreSQL Data cleaning AKrac6on score (AS) Super

    store Average store Features Model Logis6c regression Random forest classifica6on Work Flow Scikit-learn KMeans cluster Missing values, outliers
  7. Data loaded in PostgreSQL Data cleaning AKrac6on score (AS) Super

    store Average store Features Model Logis6c regression Random forest classifica6on Work Flow Scikit-learn KMeans cluster Missing values, outliers
  8. Data loaded in PostgreSQL Data cleaning Customer aKrac6on score Super

    store Average store Features Model Logis6c regression Random forest classifica6on Work Flow Scikit-learn KMeans cluster
  9. Data loaded in PostgreSQL Data cleaning Customer aKrac6on score Super

    stores Average stores Features Model Logis6c regression Random forest classifica6on Work Flow KMeans cluster
  10. Data loaded in PostgreSQL Data cleaning Customer aKrac6on score Super

    stores Average stores Features Model Logis6c regression Random forest classifica6on Work Flow
  11. People without a degree like the store 0 0.1 0.2

    0.3 0.4 # No degree $ Vegetable Age > 65 yr Feature importance from random forest model
  12. People with large spending in vegetables are less likely to

    visit the store 0 0.1 0.2 0.3 0.4 # No degree $ Vegetable Age > 65 yr Feature importance from random forest model
  13. Seniors like the store 0 0.1 0.2 0.3 0.4 #

    No degree $ Vegetable Age > 65 yr Feature importance from random forest model
  14. Miao Xie PhD in Physical Chemistry at UCLA –  Characterizing

    superhard materials Product engineer in Silicon Valley