Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement
Dmytro Karamshuk King's College London Based on the paper: D. Karamshuk, A. Noulas, S. Scellato, V. Nicosia, C. Mascolo. Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Chicago, 2013

Optimal Retail Location Problem Among L possible locations in the
city select one where a new store would be most profitable/popular.

Optimal Retail Location Problem • A. Athiyaman. Location decision making:
the case of retail service development in a closed population. In Academy of Marketing Studies, volume 15, page 13, 2010. • O. Berman and D. Krass. The generalized maximal covering location problem. Computers & Operations Research, 29(6):563–581, 2002. • A. Kubis and M. Hartmann. Analysis of location of large-area shopping centres. a probabilistic gravity model for the halle-leipzig area. Jahrbuch für Regionalwissenschaft, 27(1):43–57, 2007. • Pablo Jensen. Network-based predictions of retail store commercial categories and optimal locations. Phys. Rev. E, 74:035101, Sep 2006. The problem is not new Our approach: explore fine-grained and cheap data from LBSN

Location-based social networks • check-in at places • share with
your friends • receive bonuses for check-ins • search for places • leave comments for others

Check-ins around the world over 40M users over 4.5B check-ins

Collecting the Data Dataset collected in New York • 37K
venues • 47K users • 621K checkins • May – November, 2010 accounts for »25% of the original data

How popular is a venue? The distance between the two
places is only few hundred meters

How popular is a venue? • popularity can be several
orders of magnitude different from place to place Distribution of check-ins per place Geographic distribution of venues size = #checkin • probably it depends on the location and types of places

Popularity and type of venue • different types and chains
of venues have different usage patterns • we cannot compare check-ins across venues of different chains but we can across individual chains Number of check-ins per place for individual chains of restaurants

Co-location with other venues How frequently we observe a Starbucks
close to a railway station? Does it influence the popularity of a restaurant? Pablo Jensen. Analyzing the localization of retail stores with complex systems tools. IDA ’09, pages 10–20, Berlin, Heidelberg, 2009. Springer-Verlag.

User mobility between places How many users go to a
Starbucks after railway station? • there is correspondence between co- location and mobility patterns • but also many discrepancies

Optimal Retail Location Problem Among L possible locations in the
city select one where new store would be most popular.

Define the area An area is defined as a disc
of radius r around a point with geographical coordinates l The area is described by a set of numeric features designed from check-ins at venues in the disk.

Geographic features of an area • density – number of
venues in the area • neighbors entropy – heterogeneity of venue types • competitiveness – percentage of competing venues

Geographic features of an area • quality by Jensen –
define inter-types attractiveness coefficients – weight surrounding venues by their attractiveness

Mobility features of an area • area popularity – total
number of checkins in the area • transition density – intensity of transitions inside the area • incoming flows – intensity of transitions from outside areas

Mobility features of an area • transition quality • define
transition coefficients for each type • weight venues according to the product of coefficient and check-ins volume

Ranking problem Use area features to rank all areas in
a given set L according to their potential popularity. Compare with the ground truth: ranking of places basing on their actual popularity.

Evaluation metrics Compare the predicted and ground truth rankings. •
Top-K locations ranking – use NDCG@K • Accuracy of the best prediction – Accuracy@X% of having the best predicted store in the Top-X% of ground truth ranking We explore random cross-validation approach and report average values across all experiments.

Performance of individual features • some indicators are general across
various chains while some are chain-specific • the lack of competitors in the area play positive role as do the existence of place attractors • performance of In.Flow is in accordance with the fact that McDonalds attract more users from the remote areas NDCG@10

Considering fusion of factors Explore the fusion of features in
a supervised learning approach • regression for ranking – conduct regression using Linear Regression, SVR or M5P and then rank according to regressed values • pair-wise ranking – learn on pair-wise comparison using neural networks RankNet Use the same evaluation methodology as for individual features.

Results of the supervised learning • supervised learning has better
performance than the the best individual feature • the combination of geographic features and mobility features yields better result than the combination of geographic features alone • regression to rank with SVR is the best performing technique NDCG@10 Individual features Supervised learning

The best location prediction • supervised learning yields reliable and
significantly improved result • the best prediction lies in top-20% of the ground truth ranking with probability over 80% Individual features Supervised learning

Implications • we show how fine-grained data from location-based social
networks can be effectively explored in geographic retail analysis • this can inspire further works in location-based advertising, developing indexes of urban areas, provision of location-based services etc. etc. • particularly we see a lot of potential in the approach of measuring user flows from check-ins in various applications • we also faced some challenges when scaling this approach to other chains and cities

Thank you for your attention! Dmytro Karamshuk King's College London
follow me on Twitter: @karamshuk

Geo-Spotting: Mining Online Location-based Serv...

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

Data Science London

More Decks by Data Science London

Other Decks in Technology

Featured

Transcript

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

Optimal Retail Location Problem Among L possible locations in the

Optimal Retail Location Problem • A. Athiyaman. Location decision making:

Location-based social networks • check-in at places • share with

Check-ins around the world over 40M users over 4.5B check-ins

Collecting the Data Dataset collected in New York • 37K

How popular is a venue? The distance between the two

How popular is a venue? • popularity can be several

Popularity and type of venue • different types and chains

Co-location with other venues How frequently we observe a Starbucks

User mobility between places How many users go to a

Optimal Retail Location Problem Among L possible locations in the

Define the area An area is defined as a disc

Geographic features of an area • density – number of

Geographic features of an area • quality by Jensen –

Mobility features of an area • area popularity – total

Mobility features of an area • transition quality • define

Ranking problem Use area features to rank all areas in

Evaluation metrics Compare the predicted and ground truth rankings. •

Performance of individual features • some indicators are general across

Considering fusion of factors Explore the fusion of features in

Results of the supervised learning • supervised learning has better

The best location prediction • supervised learning yields reliable and

Implications • we show how fine-grained data from location-based social

Thank you for your attention! Dmytro Karamshuk King's College London