Advances in datasets and models for simulating flows: opportunities and risks

Advances in datasets and models for simulating population flows opportunities
and risks Robin Lovelace, University of Leeds 5th June, Daytime Population Movements Workshop @ University of Leeds. Slides Available: robinlovelace.net

What I’m going to talk about 1. Advances in datasets
2. Advances in models 3. Opportunities 4. Risks 5. Issues + Conclusion

Part I: Advances in the data Ongoing and accelerating digital
revolution More data than ever before Most growth in 'big data' Or rather 'V data': • High Volume • High Velocity • Highly Variable Continuing rush to access this data

What flow data do/should researcher have access to? Availability ->
Right to access/store/use -> 2001 o-d flow data (imperfect) Twitter API Historic tweets (pending Library of Congress action) (£) Mobile phone triangulation data (e.g. Telefonica) Strava data on Running/cycling (£) Migration flow data Anonymous, non-geo. Ind. Survey data Google Location Services data Primary survey data (e.g. Ian Kellar; LAs, Bogota) Size/recent/(potential) utility Direction of movement

Trends • Public -> private data provision • Free (samples)
-> paid data • Small -> big • Pre-processed by provider -> academics pre-process the data (e.g. Sandy Tweets) • Aggregate -> Individual-level • Space + time snapshots -> Spacetime

Part II: New models Modern computing is now sufficiently powerful
to deal with most [urban] models ... models based on individuals are now feasible both in terms of their computation and their representation using new programming languages” (Batty, 2007, p. 5).

The radiation model • Tij: flow from i to j
• Ti: flow out of i – sum(Tj≄i) • mi: population of zone i (equiv: Pi) • nj: population of dest. Zone (eqiv: Wj) • sij: population in the circle surrounding i, with circumference touching j

A visualisation of sij i j Radius rij Sij =
sum(pop %in% Circle of radius rij) Sum of populations of all black circles

Part III: Opportunities Opportunities of 'Big' data • New insight
into questions previously beyond the reach of survey • Diversity, low cost, comprehensive coverage • High spatial and temporal resolution

Opportunities for new models • Offer better estimates of flow
than before • Could impact policy: • Transport planning • Location analysis • Sustainable economy • Simplicity: "being parameter-free is a significant and desirable change from past practice." (Masucci et al. 2013) • Include insights from 'Big data' revolution

Part IV: Risks Of new data • Data vs actual
behaviour • Policy relevance • Time pre-processing • Unrepresentative (Strava) • Less use of official data Of new models • New is not always better • Oversimplification (Masucci et al. 2013)

Part V: Issues New datasets call for • New statistical
tools (Bayesian) to deal with uncertainty • Ways to ingest continual data • Filtering • Aggregation New models call for: • More code sharing (e.g. Dennett, 2012) • Rigorous comparative testing • Ways to input new data streams • Visualisation of key processes

Conclusion Opportunities and risks associated with both new datasets and
models Risks much greater in area of new datasets Little correspondence between advances in modelling and data sources Bridging this model-data gap = research priority: thinking behind my research at Leeds

Key References • Dennett, A. (2012). Estimating flows between geographical
locations:’get me started in'spatial interaction modelling. UCL Working Papers Series, 44(0), 0–24. • Lovelace, R., Malleson, N., Harland, K., & Birkin, M. (2014). Geotagged tweets to inform a spatial interaction model: a case study of museums. arXiv preprint arXiv:1403.5118. • Masucci, a. P., Serras, J., Johansson, A., & Batty, M. (2013). Gravity versus radiation models: On the importance of scale and heterogeneity in commuting flows. Physical Review E, 88(2). • Simini, F., González, M. C., Maritan, A., & Barabási, A. L. (2012). A universal model for mobility and migration patterns. Nature, 484(7392), 96-100.

Using V data to inform geog. model Simple calibration procedure:
reran model for many different beta values Closest aggregated tweet/model fit selected for different model implementations Opportunities for Bayesian approaches here

Aggregation Necessary to compare aggregate flow model with individual Tweets
Also vital to 'smooth' the stochasticity inherent to VGI In reality: LOTS more data needed for reliable results

Issue: visualising complexity Too complex? (from Simini et al. (2012)
No direction, simplification by sampling (Lovelace et al., 2014) With ubiquitous internet and JavaScript, do we need interactive visuals?

The classic 'gravity' model In R code: for(i in 1:nrow(w)){
for(j in 1:nrow(m)){ S[i,j] <- inc * P[i] * W[j] * exp(-beta * D[i,j]) } } In maths: Inc: income proxy P: population W: museum attractivenes beta: dist. decay constant d: Euclidean distance i, j: Origins and destinations D <- gDistance(m, pops, byid=T)/1000 inc <- 0.1 beta <- 0.3 P <- pops$totpop # zone population W <- A <- rep(1, times=nrow(m)) S <- D^0

Advances in datasets and models for simulating ...

Advances in datasets and models for simulating flows: opportunities and risks

Robin

More Decks by Robin

Other Decks in Technology

Featured

Transcript

Advances in datasets and models for simulating population flows opportunities

What I’m going to talk about 1. Advances in datasets

Part I: Advances in the data Ongoing and accelerating digital

What flow data do/should researcher have access to? Availability ->

Trends • Public -> private data provision • Free (samples)

Part II: New models Modern computing is now sufficiently powerful

The radiation model • Tij: flow from i to j

A visualisation of sij i j Radius rij Sij =

Part III: Opportunities Opportunities of 'Big' data • New insight

Opportunities for new models • Offer better estimates of flow

Part IV: Risks Of new data • Data vs actual

Part V: Issues New datasets call for • New statistical

Conclusion Opportunities and risks associated with both new datasets and

Key References • Dennett, A. (2012). Estimating flows between geographical

Using V data to inform geog. model Simple calibration procedure:

Aggregation Necessary to compare aggregate flow model with individual Tweets

Issue: visualising complexity Too complex? (from Simini et al. (2012)

The classic 'gravity' model In R code: for(i in 1:nrow(w)){