Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

by Dmytro Karamshuk, Researcher @ Kings College. Talk at Data Science London @ds_ldn

Data Science London

October 29, 2013
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

    Dmytro Karamshuk King's College London Based on the paper: D. Karamshuk, A. Noulas, S. Scellato, V. Nicosia, C. Mascolo. Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Chicago, 2013
  2. Optimal Retail Location Problem Among L possible locations in the

    city select one where a new store would be most profitable/popular.
  3. Optimal Retail Location Problem • A. Athiyaman. Location decision making:

    the case of retail service development in a closed population. In Academy of Marketing Studies, volume 15, page 13, 2010. • O. Berman and D. Krass. The generalized maximal covering location problem. Computers & Operations Research, 29(6):563–581, 2002. • A. Kubis and M. Hartmann. Analysis of location of large-area shopping centres. a probabilistic gravity model for the halle-leipzig area. Jahrbuch für Regionalwissenschaft, 27(1):43–57, 2007. • Pablo Jensen. Network-based predictions of retail store commercial categories and optimal locations. Phys. Rev. E, 74:035101, Sep 2006. The problem is not new Our approach: explore fine-grained and cheap data from LBSN
  4. Location-based social networks • check-in at places • share with

    your friends • receive bonuses for check-ins • search for places • leave comments for others
  5. Collecting the Data Dataset collected in New York • 37K

    venues • 47K users • 621K checkins • May – November, 2010 accounts for »25% of the original data
  6. How popular is a venue? The distance between the two

    places is only few hundred meters
  7. How popular is a venue? • popularity can be several

    orders of magnitude different from place to place Distribution of check-ins per place Geographic distribution of venues size = #checkin • probably it depends on the location and types of places
  8. Popularity and type of venue • different types and chains

    of venues have different usage patterns • we cannot compare check-ins across venues of different chains but we can across individual chains Number of check-ins per place for individual chains of restaurants
  9. Co-location with other venues How frequently we observe a Starbucks

    close to a railway station? Does it influence the popularity of a restaurant? Pablo Jensen. Analyzing the localization of retail stores with complex systems tools. IDA ’09, pages 10–20, Berlin, Heidelberg, 2009. Springer-Verlag.
  10. User mobility between places How many users go to a

    Starbucks after railway station? • there is correspondence between co- location and mobility patterns • but also many discrepancies
  11. Optimal Retail Location Problem Among L possible locations in the

    city select one where new store would be most popular.
  12. Define the area An area is defined as a disc

    of radius r around a point with geographical coordinates l The area is described by a set of numeric features designed from check-ins at venues in the disk.
  13. Geographic features of an area • density – number of

    venues in the area • neighbors entropy – heterogeneity of venue types • competitiveness – percentage of competing venues
  14. Geographic features of an area • quality by Jensen –

    define inter-types attractiveness coefficients – weight surrounding venues by their attractiveness
  15. Mobility features of an area • area popularity – total

    number of checkins in the area • transition density – intensity of transitions inside the area • incoming flows – intensity of transitions from outside areas
  16. Mobility features of an area • transition quality • define

    transition coefficients for each type • weight venues according to the product of coefficient and check-ins volume
  17. Ranking problem Use area features to rank all areas in

    a given set L according to their potential popularity. Compare with the ground truth: ranking of places basing on their actual popularity.
  18. Evaluation metrics Compare the predicted and ground truth rankings. •

    Top-K locations ranking – use NDCG@K • Accuracy of the best prediction – Accuracy@X% of having the best predicted store in the Top-X% of ground truth ranking We explore random cross-validation approach and report average values across all experiments.
  19. Performance of individual features • some indicators are general across

    various chains while some are chain-specific • the lack of competitors in the area play positive role as do the existence of place attractors • performance of In.Flow is in accordance with the fact that McDonalds attract more users from the remote areas NDCG@10
  20. Considering fusion of factors Explore the fusion of features in

    a supervised learning approach • regression for ranking – conduct regression using Linear Regression, SVR or M5P and then rank according to regressed values • pair-wise ranking – learn on pair-wise comparison using neural networks RankNet Use the same evaluation methodology as for individual features.
  21. Results of the supervised learning • supervised learning has better

    performance than the the best individual feature • the combination of geographic features and mobility features yields better result than the combination of geographic features alone • regression to rank with SVR is the best performing technique NDCG@10 Individual features Supervised learning
  22. The best location prediction • supervised learning yields reliable and

    significantly improved result • the best prediction lies in top-20% of the ground truth ranking with probability over 80% Individual features Supervised learning
  23. Implications • we show how fine-grained data from location-based social

    networks can be effectively explored in geographic retail analysis • this can inspire further works in location-based advertising, developing indexes of urban areas, provision of location-based services etc. etc. • particularly we see a lot of potential in the approach of measuring user flows from check-ins in various applications • we also faced some challenges when scaling this approach to other chains and cities