Nik Lomax
September 09, 2019
140

# What is... IPF and Microsimulation

Here is the summary of the session at the British Society for Population Studies annual conference in Cardiff, organised by my colleague Paul Norman.

This session is aimed at those who want to know more about getting skilled up. The audience might be PhD students, postdocs, local or national government researchers or just anybody intrigued about knowing about something new to them. Each (c. 10 minute) presentation will: introduce in an accessible manner a quantitative method and will explain what you can do with it and with what kinds of data. Where possible there will be research examples and pointers to materials which can help people learn the method.

I covered Iterative Proportional Fitting (IPF) and Microsimulation (IPF):

Iterative Proportional Fitting
Iterative Proportional Fitting (IPF) is a technique that can be used to adjust a distribution reported in one data set by totals reported in others. There are various data situations in population research when values for population attributes might be missing due to being unknown, unreliable, outdated, or a sample. IPF provides a tool for estimating these missing data.

Microsimulation
Microsimulation is an approach used to estimate the characteristics of individuals within a population from a range of attribute-rich data sources and for modelling those individuals over time. The creation of a synthetic population of individuals is often called spatial microsimulation because spatial identifiers are added to the individuals. Dynamic models deal with time steps, they are used to age the individuals and alter their characteristics by introducing the probability of transitioning between different states. For example, dynamic models can be used in health policy to assess outcomes for individuals in the model under different intervention scenarios.

## Nik Lomax

September 09, 2019

## Transcript

1. ### What is… Iterative Proportional Fitting? Nik Lomax School of Geography,

University of Leeds British Society for Population Studies Annual Conference, Cardiff 9 September 2019
2. ### Iterative Proportional Fitting (IPF) is… • A technique for reweighting

a known multidimensional array (e.g. cross-tabulated data) to target marginal totals • Used by demographers, transport planners economists and computer scientists • Can be done in a wide range of software, from Excel to bespoke packages
3. ### You might know it by another name • RAS in

economics (see Bacharach 1965) • Cross–Fratar (Fratar 1954) or Furness (Furness 1965) in transport engineering • Raking in in computer science and statistics (Cohen 2008) • IPF has also been referred to as rim-weighting or structure- preserving estimation (Simpson and Tranmer 2005).

5. ### Lomax, N., Norman, P., Rees, P. et al. (2013) Subnational

migration in the United Kingdom: producing a consistent time series using a combination of available data and estimates, J Pop Research, 30: 265. https://doi.org/10.1007/s12546-013-9115-z
6. ### Background • First (demographic) use of IPF widely attributed to

Deming and Stephan (1940), who applied the technique to data from the 1940 U.S. census • Although there were complete counts of the population for certain characteristics, when these characteristics were cross-tabulated the output was limited to a sample of the population. • They used this sample as the starting distribution (the seeds) and applied IPF to derive an estimate of these cross-tabulated characteristics for the whole population.
7. ### Some examples of IPF in action • To estimate the

characteristics of residents of small geographical areas Birkin and Clarke (1988) • To updated the age and sex structure of small area populations in the UK (Rees 1994) • To estimate small area population counts of car ownership and tenure type using 1991 Census data (Simpson and Tranmer, 2005) • To disaggregate migration data by age and sex by (Willekens, Por, and Raquillet, 1981; Willekens, 1982). • To estimate missing cross-border migration data for the United Kingdom (Lomax et al. 2013)

12. ### Software solutions • Modules and user-produced syntax are available for

Excel, SAS, Matlab, Stata, and SPSS. • I like the mipfp package in R
13. ### References • Bacharach, M. 1965. Estimating nonnegative matrices from marginal

data. International Economic Review, 6(3): 294–310 • Birkin, M., and M. Clarke. 1988. SYNTHESIS—A synthetic spatial information system for urban and regional analysis: Methods and examples.Environment and Planning A20(12): 1645–71 • Cohen, M. 2008. Raking. Encyclopedia of survey researchmethods, ed.P.Lavrakas, 672–74. Thousand Oaks, CA: Sage. • Fratar, T. J. 1954. Vehicular trip distribution by successive approximations. Traffic Quarterly, 8(1): 53–65. • Furness, K. P. 1965. Time function iteration. Traffic Engineering and Control, 7(7): 458–60. • Lomax, N., Norman, P., Rees, P. et al. 2013. Subnational migration in the United Kingdom: producing a consistent time series using a combination of available data and estimates, J Pop Research, 30: 265. https://doi.org/10.1007/s12546-013-9115-z • Lomax, N & Norman, P. 2016. Estimating Population Attribute Values in a Table: “Get Me Started in” Iterative Proportional Fitting, The Professional Geographer, 68:3,451-461, DOI: 10.1080/00330124.2015.1099 • Rees, P. 1994. Estimating and projecting the populations of urban communities. Environment & Planning A, 26:1671–97. • Simpson, L., and M. Tranmer. 2005. Combining sample and census data in small area estimates: Iterative proportional fitting with standard software. The Professional Geographer57 (2): 222–34. • Willekens, F. 1982. Multidimensional population analysis with incomplete data. In Multidimensional mathematical demography, ed. K. Land and A. Rogers, 43–111. NewYork: Academic. • Willekens, F., A. Por, and R. Raquillet. 1981. Entropy, multiproportional, and quadratic techniques for inferring patterns of migration from aggregate data. In: Advances in multiregional demography, ed. A. Rogers, 84–106. Laxenburg, Austria: International Institute for Applied Systems Analysis.
14. ### What is… Microsimulation? Nik Lomax School of Geography, University of

Leeds British Society for Population Studies Annual Conference, Cardiff 9 September 2019
15. ### Microsimulation is… • A technique for producing synthetic microdata comprising

individuals, where detailed attribute information is combined with more complete (often spatial) data And… • A technique for simulating individuals over time, drawing on estimated transition probabilities to estimate change in state
16. ### ‘Types’ of Microsimulation (1) Creating Synthetic Data Sample or survey

data Target or constraining data
17. ### ‘Types’ of Microsimulation (1) Creating Synthetic Data Sample or survey

data Target or constraining data Adding geography as a constraint makes this *spatial* microsimulation
18. ### ‘Types’ of Microsimulation (2) Dynamic Models • Dynamic models incorporate

time, and simulate changes in the state of characteristics for individuals from estimated transitions • For example a change in state might be from employed to unemployed, or from alive to dead. • Calculated by randomly drawing from discrete probability distributions • In 10 minutes we will mainly focus on synthetic data generation (aka population synthesis)
19. ### ‘Types’ of Microsimulation (2) Dynamic Models 3.4% 3.2% 10.2% 30.8%

22.9% 77.7% 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% Diabetes Cancer Heart Disease Risk England - Lifetime Chronic Disease Risk Prevalence at age 51-52 Incidence after age 51-52 27.4% 67.5% 19.7%
20. ### Background • Many of today’s microsimulation developments are rooted in

the work of Orcutt (1957) and Orcutt et al. (1961), who argued that theoretical models of socio-economic systems are best applied at the individual level because it is individuals who make decisions within the system. • Concern that macroeconomic models were not able to assess the effect of government policy on things like income distribution or policy
21. ### Some uses of microsimulation • To model future elderly health

care demand (Clark et al. 2017) • To project educational attainment (Nelissen 1991) • To estimate commuting patterns (Lovelace, Ballas and Watson 2014) • To produce small area population estimates (Lomax and Smith 2018) • To correct missing data bias in historical demography (Ruggles 1992) • To estimate completed fertility in France (Thomson et al. 2012)
22. ### Methods for creating synthetic data • Deterministic reweighting • Iterative

Proportional Fitting • Integerised IPF / GREGWT • Proabilistic (combinatorial optimization) approaches • Simulated Annealing • Hill climbing • For a discussion of pros/cons see Lovelace and Ballas (2013)
23. ### Microsimulation using Simulated Annealing • See Harland (2013) Flexible Modelling

Framework • A good Geographical User Interface for MSM • starts by randomly sampling a population of individuals • Calculate the fit of the population: Total Absolute Error between the count of individuals in each category in the synthetic population and the expected count from the constraint tables summed across all constraint tables. • Replace random individual from the synthetic population with person from the sample population. • Calculate fit of synthetic population again. If the change improves the population fitness the change is automatically accepted. • Repeat

25. ### Spatial Microsimulation example: estimates of household expenditure Living Cost and

Food Survey – detailed attribute data Census and mid-year estimate population data Local authority geography
26. ### Detailed methodology paper • Living Cost and Food Survey provides

very detailed breakdown of expenditure behavior for sample • Matching variables (age, sex, socio- economic status) available in census • Uses IPF to assign expenditure values from LCFS to the census (and mid- year population to produce a time- series) James, W., Lomax, N. and Birkin, M. (2018) Scientific Data, 6:56 | https://doi.org/10.1038/s41597-019-0064-z
27. ### Software solutions • R implementations: MicSim and others • STATA,

Python, C++ … • … Most models are bespoke
28. ### References • Clark S, Birkin M, Heppenstall A and Rees

P (2017) Using 2011 Census data to estimate future elderly heath care demand. In: Stillwell J and Duke-Williams O (eds) The Routledge Handbook of Census Resources, Methods and Applications: Unlocking the UK 2011 Census. London: Routledge; 305–319. • Harland, K (2013) Microsimulation Model User Guide (Flexible Modelling Framework). NCRM Working Paper. NCRM. • Lomax, N and Smith, A (2017) Microsimulation for demography. Australian Population Studies, 1(1): 73-85. • Lovelace, R (2016) Spatial Microsimulation with R. CRC Press. • Lovelace, R and Ballas, D (2013) ‘Truncate, replicate, sample’: A method for creating integer weights for spatial microsimulation, Computers, Environment and Urban Systems, 41: 1-11 • Lovelace R, Ballas D and Watson M (2014) A spatial microsimulation approach for the analysis of commuter patterns: from individual to regional levels. Journal of Transport Geography 34: 282–296. • Nelissen J (1991) Household and education projections by means of a microsimulation model. Economic Modelling 8(4): 480–511. • Ruggles S (1992) Migration, marriage, and mortality: correcting sources of bias in English family reconstitutions. Population studies 46(3): 507–522. • Thomson E, Winkler-Dworak M, Spielauer M and Prskawetz A (2012) Union instability as an engine of fertility? A microsimulation model for France. Demography 49(1): 175–195. • Orcutt, G.H. (1957) A new type of socio-economic system. The Review of Economics and Statistics, 39(2): 116-123. • Orcutt, G., Greenberger, M., Korbel, J. and Rivlin A (1961) Microanalysis of Socioeconomic Systems: A Simulation Study. New York: Harper and Row.