Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Advances in datasets and models for simulating flows: opportunities and risks

Robin
June 05, 2014

Advances in datasets and models for simulating flows: opportunities and risks

Presentation for modelling daytime population movements, Leeds, 5th June 2014.

Robin

June 05, 2014
Tweet

More Decks by Robin

Other Decks in Technology

Transcript

  1. Advances in datasets and models for simulating population flows opportunities

    and risks Robin Lovelace, University of Leeds 5th June, Daytime Population Movements Workshop @ University of Leeds. Slides Available: robinlovelace.net
  2. What I’m going to talk about 1. Advances in datasets

    2. Advances in models 3. Opportunities 4. Risks 5. Issues + Conclusion
  3. Part I: Advances in the data Ongoing and accelerating digital

    revolution More data than ever before Most growth in 'big data' Or rather 'V data': • High Volume • High Velocity • Highly Variable Continuing rush to access this data
  4. What flow data do/should researcher have access to? Availability ->

    Right to access/store/use -> 2001 o-d flow data (imperfect) Twitter API Historic tweets (pending Library of Congress action) (£) Mobile phone triangulation data (e.g. Telefonica) Strava data on Running/cycling (£) Migration flow data Anonymous, non-geo. Ind. Survey data Google Location Services data Primary survey data (e.g. Ian Kellar; LAs, Bogota) Size/recent/(potential) utility Direction of movement
  5. Trends • Public -> private data provision • Free (samples)

    -> paid data • Small -> big • Pre-processed by provider -> academics pre-process the data (e.g. Sandy Tweets) • Aggregate -> Individual-level • Space + time snapshots -> Spacetime
  6. Part II: New models Modern computing is now sufficiently powerful

    to deal with most [urban] models ... models based on individuals are now feasible both in terms of their computation and their representation using new programming languages” (Batty, 2007, p. 5).
  7. The radiation model • Tij: flow from i to j

    • Ti: flow out of i – sum(Tj≄i) • mi: population of zone i (equiv: Pi) • nj: population of dest. Zone (eqiv: Wj) • sij: population in the circle surrounding i, with circumference touching j
  8. A visualisation of sij i j Radius rij Sij =

    sum(pop %in% Circle of radius rij) Sum of populations of all black circles
  9. Part III: Opportunities Opportunities of 'Big' data • New insight

    into questions previously beyond the reach of survey • Diversity, low cost, comprehensive coverage • High spatial and temporal resolution
  10. Opportunities for new models • Offer better estimates of flow

    than before • Could impact policy: • Transport planning • Location analysis • Sustainable economy • Simplicity: "being parameter-free is a significant and desirable change from past practice." (Masucci et al. 2013) • Include insights from 'Big data' revolution
  11. Part IV: Risks Of new data • Data vs actual

    behaviour • Policy relevance • Time pre-processing • Unrepresentative (Strava) • Less use of official data Of new models • New is not always better • Oversimplification (Masucci et al. 2013)
  12. Part V: Issues New datasets call for • New statistical

    tools (Bayesian) to deal with uncertainty • Ways to ingest continual data • Filtering • Aggregation New models call for: • More code sharing (e.g. Dennett, 2012) • Rigorous comparative testing • Ways to input new data streams • Visualisation of key processes
  13. Conclusion Opportunities and risks associated with both new datasets and

    models Risks much greater in area of new datasets Little correspondence between advances in modelling and data sources Bridging this model-data gap = research priority: thinking behind my research at Leeds
  14. Key References • Dennett, A. (2012). Estimating flows between geographical

    locations:’get me started in'spatial interaction modelling. UCL Working Papers Series, 44(0), 0–24. • Lovelace, R., Malleson, N., Harland, K., & Birkin, M. (2014). Geotagged tweets to inform a spatial interaction model: a case study of museums. arXiv preprint arXiv:1403.5118. • Masucci, a. P., Serras, J., Johansson, A., & Batty, M. (2013). Gravity versus radiation models: On the importance of scale and heterogeneity in commuting flows. Physical Review E, 88(2). • Simini, F., González, M. C., Maritan, A., & Barabási, A. L. (2012). A universal model for mobility and migration patterns. Nature, 484(7392), 96-100.
  15. Using V data to inform geog. model Simple calibration procedure:

    reran model for many different beta values Closest aggregated tweet/model fit selected for different model implementations Opportunities for Bayesian approaches here
  16. Aggregation Necessary to compare aggregate flow model with individual Tweets

    Also vital to 'smooth' the stochasticity inherent to VGI In reality: LOTS more data needed for reliable results
  17. Issue: visualising complexity Too complex? (from Simini et al. (2012)

    No direction, simplification by sampling (Lovelace et al., 2014) With ubiquitous internet and JavaScript, do we need interactive visuals?
  18. The classic 'gravity' model In R code: for(i in 1:nrow(w)){

    for(j in 1:nrow(m)){ S[i,j] <- inc * P[i] * W[j] * exp(-beta * D[i,j]) } } In maths: Inc: income proxy P: population W: museum attractivenes beta: dist. decay constant d: Euclidean distance i, j: Origins and destinations D <- gDistance(m, pops, byid=T)/1000 inc <- 0.1 beta <- 0.3 P <- pops$totpop # zone population W <- A <- rep(1, times=nrow(m)) S <- D^0