Slide 1

Slide 1 text

Augmenting official datasets with Volunteered Geographic Information: a case study of daily travel patterns Robin Lovelace, Tom Berry Mark Birkin Slides: https://speakerdeck.com/robinlovelace

Slide 2

Slide 2 text

Motivation  It's useful to know where people travel … As shown by the propensity to cycle tool (live demo): http://pct.bike/

Slide 3

Slide 3 text

Issues with current data Census data:  Low resolution  A subset of the population  Only one trip purpose considered Mobile telephone data:  Ownership  Availability Crowd source GPS data:  Same again...

Slide 4

Slide 4 text

Aims  Aims:  Establish key activity spaces from frequency of Tweets e.g. Home, Work, Retail, Leisure, other  What are the key drivers of variation in certain areas and when do they happen  Why variations in Tweet locations occur between Days-of-the-Week, Time- of-Day and Seasons  Better understand the movement of people at a micro-level  Using more than just residential location to improve service provision Colour key: Partially complete Not complete New lines of enquiry

Slide 5

Slide 5 text

Pre-processing/Methods  Brief Methods:  Began with 120 million Tweets  Reduced to 1.5 million – placing a bounding box around Leeds  Removed duplicates/robots etc  Further reduction to dataset  Using the full dataset proved ineffective when undertaking highly detailed analysis  Most people only tweeted a few times  Wanted to focus on frequent tweeters as they will provide greater geographic information  Therefore removed all Tweets from any users who had a frequency below 200

Slide 6

Slide 6 text

Wordcloud  Began investigating the main body of the Tweets  Lack of key phrases to look for  Things such as LUFC were among the most popular  Issues with different spellings, abbreviations etc  Decided the geolocations of the Tweets would be better suited to the study

Slide 7

Slide 7 text

Day of the Week

Slide 8

Slide 8 text

Time of the Day

Slide 9

Slide 9 text

Finding 10 User Case Studies  Identify the users with the greatest potential to provide the information required  Therefore require:  A large number of Tweets  Suitable spread across Leeds – not all at one point  Top and Tail 15% either side based on combined standard deviation between X and Y coordinates of each user  Results in final dataset of 708 users with 376,000 Tweets

Slide 10

Slide 10 text

10 User Sample Can see beginnings of user movement patterns Shows difference between user habits in terms of location, frequency and spread Clear identification of key activity spaces

Slide 11

Slide 11 text

Display Issues

Slide 12

Slide 12 text

10 User Time of the Day Clear movement of people away from the centre during non-working hours. The centre of Leeds maintains its status as the primary area for people to Tweet in. Eastern LSOAs are more residential - where people tweet during non-working hours

Slide 13

Slide 13 text

Individual Case Study

Slide 14

Slide 14 text

Proposed Next Steps  Gaining insight into non-work travel  Look at instances where tweeters tweet within a 1 hour interval from different locations  Provide highly accurate analysis to augment existing data sources

Slide 15

Slide 15 text

Fundamental issues with social media data for travel behaviour  Biased  Intermittent  Point based  Spatial skew  Incommensurable  Unwieldy

Slide 16

Slide 16 text

GPS data → model  Logic: we have good models to estimate regular interzonal flow with 'Gravity Models' and Simini et al (2012).

Slide 17

Slide 17 text

My implementation of the radiation model in stplanr (licence: MIT): https://github.com/ropensci/stplanr/blob/master/R/radiate.R

Slide 18

Slide 18 text

Results from the radiation model Population: proportional to circle size Line thickness: proportional to flow Code: devtools::install_github(“ropensci/stplanr”)

Slide 19

Slide 19 text

Creating a generalised theory of activity space movement Source: https://www.openstreetmap.org/user/Canyonsrcool/traces/2137292 Random Walk, Brownian Motion and other ABM algorithms can help.

Slide 20

Slide 20 text

Manifesto for modelling travel patterns with large datasets 'Twitterlike' data is crap for geo* But can provide testbed for ideas/code GPS data is much more promising Leading to the need to model 'activity spaces' and 'intrazonal flows' not captured by spatial interaction models Based on Brownian motion + Ecological mathematics and theory

Slide 21

Slide 21 text

Key references Jonsen, Ian D., Joanna Mills Flemming, and Ransom A. Myers. ‘Robust State– Space Modeling of Animal Movement Data’. Ecology 86, no. 11 (1 November 2005): 2874–80. doi:10.1890/04-1852. Lovelace, Robin, Martin Clarke, Philip Cross, and Mark Birkin. ‘From Big Noise to Big Data: Towards the Verification of Large Datasets for Understanding Regional Retail Flows’. Geographical Analysis, 2015. Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. ‘The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning’. arXiv:1509.04425 [cs], 15 September 2015. http://arxiv.org/abs/1509.04425. Simini, Filippo, Marta C González, Amos Maritan, and Albert-László Barabási. ‘A Universal Model for Mobility and Migration Patterns.’ Nature, February 2012, 8– 12. doi:10.1038/nature10856.