Augmenting official datasets with Volunteered Geographic Information: a case study of daily travel patterns

Augmenting official datasets with Volunteered Geographic Information: a case study
of daily travel patterns Robin Lovelace, Tom Berry Mark Birkin Slides: https://speakerdeck.com/robinlovelace

Motivation  It's useful to know where people travel …
As shown by the propensity to cycle tool (live demo): http://pct.bike/

Issues with current data Census data:  Low resolution 
A subset of the population  Only one trip purpose considered Mobile telephone data:  Ownership  Availability Crowd source GPS data:  Same again...

Aims  Aims:  Establish key activity spaces from frequency
of Tweets e.g. Home, Work, Retail, Leisure, other  What are the key drivers of variation in certain areas and when do they happen  Why variations in Tweet locations occur between Days-of-the-Week, Time- of-Day and Seasons  Better understand the movement of people at a micro-level  Using more than just residential location to improve service provision Colour key: Partially complete Not complete New lines of enquiry

Pre-processing/Methods  Brief Methods:  Began with 120 million Tweets
 Reduced to 1.5 million – placing a bounding box around Leeds  Removed duplicates/robots etc  Further reduction to dataset  Using the full dataset proved ineffective when undertaking highly detailed analysis  Most people only tweeted a few times  Wanted to focus on frequent tweeters as they will provide greater geographic information  Therefore removed all Tweets from any users who had a frequency below 200

Wordcloud  Began investigating the main body of the Tweets
 Lack of key phrases to look for  Things such as LUFC were among the most popular  Issues with different spellings, abbreviations etc  Decided the geolocations of the Tweets would be better suited to the study

Day of the Week

Time of the Day

Finding 10 User Case Studies  Identify the users with
the greatest potential to provide the information required  Therefore require:  A large number of Tweets  Suitable spread across Leeds – not all at one point  Top and Tail 15% either side based on combined standard deviation between X and Y coordinates of each user  Results in final dataset of 708 users with 376,000 Tweets

10 User Sample Can see beginnings of user movement patterns
Shows difference between user habits in terms of location, frequency and spread Clear identification of key activity spaces

Display Issues

10 User Time of the Day Clear movement of people
away from the centre during non-working hours. The centre of Leeds maintains its status as the primary area for people to Tweet in. Eastern LSOAs are more residential - where people tweet during non-working hours

Individual Case Study

Proposed Next Steps  Gaining insight into non-work travel 
Look at instances where tweeters tweet within a 1 hour interval from different locations  Provide highly accurate analysis to augment existing data sources

Fundamental issues with social media data for travel behaviour 
Biased  Intermittent  Point based  Spatial skew  Incommensurable  Unwieldy

GPS data → model  Logic: we have good models
to estimate regular interzonal flow with 'Gravity Models' and Simini et al (2012).

My implementation of the radiation model in stplanr (licence: MIT):
https://github.com/ropensci/stplanr/blob/master/R/radiate.R

Results from the radiation model Population: proportional to circle size
Line thickness: proportional to flow Code: devtools::install_github(“ropensci/stplanr”)

Creating a generalised theory of activity space movement Source: https://www.openstreetmap.org/user/Canyonsrcool/traces/2137292
Random Walk, Brownian Motion and other ABM algorithms can help.

Manifesto for modelling travel patterns with large datasets 'Twitterlike' data
is crap for geo* But can provide testbed for ideas/code GPS data is much more promising Leading to the need to model 'activity spaces' and 'intrazonal flows' not captured by spatial interaction models Based on Brownian motion + Ecological mathematics and theory

Key references Jonsen, Ian D., Joanna Mills Flemming, and Ransom
A. Myers. ‘Robust State– Space Modeling of Animal Movement Data’. Ecology 86, no. 11 (1 November 2005): 2874–80. doi:10.1890/04-1852. Lovelace, Robin, Martin Clarke, Philip Cross, and Mark Birkin. ‘From Big Noise to Big Data: Towards the Verification of Large Datasets for Understanding Regional Retail Flows’. Geographical Analysis, 2015. Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. ‘The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning’. arXiv:1509.04425 [cs], 15 September 2015. http://arxiv.org/abs/1509.04425. Simini, Filippo, Marta C González, Amos Maritan, and Albert-László Barabási. ‘A Universal Model for Mobility and Migration Patterns.’ Nature, February 2012, 8– 12. doi:10.1038/nature10856.

Augmenting official datasets with Volunteered G...

Augmenting official datasets with Volunteered Geographic Information: a case study of daily travel patterns

Robin

More Decks by Robin

Other Decks in Research

Featured

Transcript