Augmenting official datasets with Volunteered Geographic Information: a case study of daily travel patterns

67b1027cca3877a76a9024425519ddde?s=47 Robin
April 02, 2016

Augmenting official datasets with Volunteered Geographic Information: a case study of daily travel patterns

Slides presented at the AAG.



April 02, 2016


  1. Augmenting official datasets with Volunteered Geographic Information: a case study

    of daily travel patterns Robin Lovelace, Tom Berry Mark Birkin Slides:
  2. Motivation  It's useful to know where people travel …

    As shown by the propensity to cycle tool (live demo):
  3. Issues with current data Census data:  Low resolution 

    A subset of the population  Only one trip purpose considered Mobile telephone data:  Ownership  Availability Crowd source GPS data:  Same again...
  4. Aims  Aims:  Establish key activity spaces from frequency

    of Tweets e.g. Home, Work, Retail, Leisure, other  What are the key drivers of variation in certain areas and when do they happen  Why variations in Tweet locations occur between Days-of-the-Week, Time- of-Day and Seasons  Better understand the movement of people at a micro-level  Using more than just residential location to improve service provision Colour key: Partially complete Not complete New lines of enquiry
  5. Pre-processing/Methods  Brief Methods:  Began with 120 million Tweets

     Reduced to 1.5 million – placing a bounding box around Leeds  Removed duplicates/robots etc  Further reduction to dataset  Using the full dataset proved ineffective when undertaking highly detailed analysis  Most people only tweeted a few times  Wanted to focus on frequent tweeters as they will provide greater geographic information  Therefore removed all Tweets from any users who had a frequency below 200
  6. Wordcloud  Began investigating the main body of the Tweets

     Lack of key phrases to look for  Things such as LUFC were among the most popular  Issues with different spellings, abbreviations etc  Decided the geolocations of the Tweets would be better suited to the study
  7. Day of the Week

  8. Time of the Day

  9. Finding 10 User Case Studies  Identify the users with

    the greatest potential to provide the information required  Therefore require:  A large number of Tweets  Suitable spread across Leeds – not all at one point  Top and Tail 15% either side based on combined standard deviation between X and Y coordinates of each user  Results in final dataset of 708 users with 376,000 Tweets
  10. 10 User Sample Can see beginnings of user movement patterns

    Shows difference between user habits in terms of location, frequency and spread Clear identification of key activity spaces
  11. Display Issues

  12. 10 User Time of the Day Clear movement of people

    away from the centre during non-working hours. The centre of Leeds maintains its status as the primary area for people to Tweet in. Eastern LSOAs are more residential - where people tweet during non-working hours
  13. Individual Case Study

  14. Proposed Next Steps  Gaining insight into non-work travel 

    Look at instances where tweeters tweet within a 1 hour interval from different locations  Provide highly accurate analysis to augment existing data sources
  15. Fundamental issues with social media data for travel behaviour 

    Biased  Intermittent  Point based  Spatial skew  Incommensurable  Unwieldy
  16. GPS data → model  Logic: we have good models

    to estimate regular interzonal flow with 'Gravity Models' and Simini et al (2012).
  17. My implementation of the radiation model in stplanr (licence: MIT):
  18. Results from the radiation model Population: proportional to circle size

    Line thickness: proportional to flow Code: devtools::install_github(“ropensci/stplanr”)
  19. Creating a generalised theory of activity space movement Source:

    Random Walk, Brownian Motion and other ABM algorithms can help.
  20. Manifesto for modelling travel patterns with large datasets 'Twitterlike' data

    is crap for geo* But can provide testbed for ideas/code GPS data is much more promising Leading to the need to model 'activity spaces' and 'intrazonal flows' not captured by spatial interaction models Based on Brownian motion + Ecological mathematics and theory
  21. Key references Jonsen, Ian D., Joanna Mills Flemming, and Ransom

    A. Myers. ‘Robust State– Space Modeling of Animal Movement Data’. Ecology 86, no. 11 (1 November 2005): 2874–80. doi:10.1890/04-1852. Lovelace, Robin, Martin Clarke, Philip Cross, and Mark Birkin. ‘From Big Noise to Big Data: Towards the Verification of Large Datasets for Understanding Regional Retail Flows’. Geographical Analysis, 2015. Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. ‘The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning’. arXiv:1509.04425 [cs], 15 September 2015. Simini, Filippo, Marta C González, Amos Maritan, and Albert-László Barabási. ‘A Universal Model for Mobility and Migration Patterns.’ Nature, February 2012, 8– 12. doi:10.1038/nature10856.