Slide 1

Slide 1 text

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement Dmytro Karamshuk King's College London Based on the paper: D. Karamshuk, A. Noulas, S. Scellato, V. Nicosia, C. Mascolo. Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Chicago, 2013

Slide 2

Slide 2 text

Optimal Retail Location Problem Among L possible locations in the city select one where a new store would be most profitable/popular.

Slide 3

Slide 3 text

Optimal Retail Location Problem ● A. Athiyaman. Location decision making: the case of retail service development in a closed population. In Academy of Marketing Studies, volume 15, page 13, 2010. ● O. Berman and D. Krass. The generalized maximal covering location problem. Computers & Operations Research, 29(6):563–581, 2002. ● A. Kubis and M. Hartmann. Analysis of location of large-area shopping centres. a probabilistic gravity model for the halle-leipzig area. Jahrbuch für Regionalwissenschaft, 27(1):43–57, 2007. ● Pablo Jensen. Network-based predictions of retail store commercial categories and optimal locations. Phys. Rev. E, 74:035101, Sep 2006. The problem is not new Our approach: explore fine-grained and cheap data from LBSN

Slide 4

Slide 4 text

Location-based social networks ● check-in at places ● share with your friends ● receive bonuses for check-ins ● search for places ● leave comments for others

Slide 5

Slide 5 text

Check-ins around the world over 40M users over 4.5B check-ins

Slide 6

Slide 6 text

Collecting the Data Dataset collected in New York ● 37K venues ● 47K users ● 621K checkins ● May – November, 2010 accounts for »25% of the original data

Slide 7

Slide 7 text

How popular is a venue? The distance between the two places is only few hundred meters

Slide 8

Slide 8 text

How popular is a venue? ● popularity can be several orders of magnitude different from place to place Distribution of check-ins per place Geographic distribution of venues size = #checkin ● probably it depends on the location and types of places

Slide 9

Slide 9 text

Popularity and type of venue ● different types and chains of venues have different usage patterns ● we cannot compare check-ins across venues of different chains but we can across individual chains Number of check-ins per place for individual chains of restaurants

Slide 10

Slide 10 text

Co-location with other venues How frequently we observe a Starbucks close to a railway station? Does it influence the popularity of a restaurant? Pablo Jensen. Analyzing the localization of retail stores with complex systems tools. IDA ’09, pages 10–20, Berlin, Heidelberg, 2009. Springer-Verlag.

Slide 11

Slide 11 text

User mobility between places How many users go to a Starbucks after railway station? ● there is correspondence between co- location and mobility patterns ● but also many discrepancies

Slide 12

Slide 12 text

Optimal Retail Location Problem Among L possible locations in the city select one where new store would be most popular.

Slide 13

Slide 13 text

Define the area An area is defined as a disc of radius r around a point with geographical coordinates l The area is described by a set of numeric features designed from check-ins at venues in the disk.

Slide 14

Slide 14 text

Geographic features of an area ● density – number of venues in the area ● neighbors entropy – heterogeneity of venue types ● competitiveness – percentage of competing venues

Slide 15

Slide 15 text

Geographic features of an area ● quality by Jensen – define inter-types attractiveness coefficients – weight surrounding venues by their attractiveness

Slide 16

Slide 16 text

Mobility features of an area ● area popularity – total number of checkins in the area ● transition density – intensity of transitions inside the area ● incoming flows – intensity of transitions from outside areas

Slide 17

Slide 17 text

Mobility features of an area ● transition quality ● define transition coefficients for each type ● weight venues according to the product of coefficient and check-ins volume

Slide 18

Slide 18 text

Ranking problem Use area features to rank all areas in a given set L according to their potential popularity. Compare with the ground truth: ranking of places basing on their actual popularity.

Slide 19

Slide 19 text

Evaluation metrics Compare the predicted and ground truth rankings. ● Top-K locations ranking – use NDCG@K ● Accuracy of the best prediction – Accuracy@X% of having the best predicted store in the Top-X% of ground truth ranking We explore random cross-validation approach and report average values across all experiments.

Slide 20

Slide 20 text

Performance of individual features ● some indicators are general across various chains while some are chain-specific ● the lack of competitors in the area play positive role as do the existence of place attractors ● performance of In.Flow is in accordance with the fact that McDonalds attract more users from the remote areas NDCG@10

Slide 21

Slide 21 text

Considering fusion of factors Explore the fusion of features in a supervised learning approach ● regression for ranking – conduct regression using Linear Regression, SVR or M5P and then rank according to regressed values ● pair-wise ranking – learn on pair-wise comparison using neural networks RankNet Use the same evaluation methodology as for individual features.

Slide 22

Slide 22 text

Results of the supervised learning ● supervised learning has better performance than the the best individual feature ● the combination of geographic features and mobility features yields better result than the combination of geographic features alone ● regression to rank with SVR is the best performing technique NDCG@10 Individual features Supervised learning

Slide 23

Slide 23 text

The best location prediction ● supervised learning yields reliable and significantly improved result ● the best prediction lies in top-20% of the ground truth ranking with probability over 80% Individual features Supervised learning

Slide 24

Slide 24 text

Implications ● we show how fine-grained data from location-based social networks can be effectively explored in geographic retail analysis ● this can inspire further works in location-based advertising, developing indexes of urban areas, provision of location-based services etc. etc. ● particularly we see a lot of potential in the approach of measuring user flows from check-ins in various applications ● we also faced some challenges when scaling this approach to other chains and cities

Slide 25

Slide 25 text

Thank you for your attention! Dmytro Karamshuk King's College London follow me on Twitter: @karamshuk