Updating the OAC: Assessing Classification Uncertainty and Cluster Stability During Intercensal Periods

Dr Michail Pavlis Dr Alex D Singleton Department of Geography
and Planning University of Liverpool Updating the OAC: Assessing Classification Uncertainty And Cluster Stability During Intercensal Periods The research presented in this paper is funded by the ESRC Secondary Data Analysis Initiative award.

Output Area Classification • Geodemographic classifications are composite measures describing
the socio-spatial structure of small geographical areas. • The Output Area Classification (OAC) is the most widely used open-source geodemographic classification in the UK. • It was developed using forty one variables from the 2001 UK census with the k-means cluster analysis. • The OAC typology comprises a three-tier of aggregate levels of clusters, i.e. seven supergroups, twenty one groups and fifty two subgroups.

OAC – 1st Level of Hierarchy Supergroup 1 (Blue Collar)
Supergroup 2 (City Living) Supergroup 3 (Countryside) Supergroup 4 (Prospering Suburbs) Supergroup 5 (Constrained by Circumstances) Supergroup 6 (Typical Traits) Supergroup 7 (Multicultural) OAC 2001 Leeds

Updating the OAC • OAC does not allow for the
classification to be updated during intercensal periods using more recent data. • Assumes that the characteristics of neighbourhoods do not change rapidly • However, temporal change and uncertainty has been found to be neither uniform in degree nor in distribution in England and Wales [1]. • The overall aim of the analysis was to develop a methodology to update the OAC by producing intercensal estimates of the OAC variables. [1] Gale C.G. and Longley, P.A. (2013) Temporal Uncertainty in a Small Area Open Geodemographic Classification. Transactions in GIS. 17(4) 563-588.

Objectives • To evaluate the usefulness of publicly available data
to provide temporal updates to OAC inputs. • To establish a framework of temporally updating OAC using both surrogate values and statistical estimates. • To investigate the stability and integrity of OAC over time by examining the output area flows between clusters for the period 2002-2010.

Available Data • Even though the objective was to update
all the 41 OAC variables this was not possible. • Constrained by available open data and spatial scale, 22 OAC variables were updated only for England. • The following data were used: i. Mid-year population estimates. ii. School data on ethnicity. iii. Indices of multiple deprivation (e.g. income, health). iv. Council tax bands. v. Jobseeker's allowance claimants.

Methodology • Updates could be produced using the data as
surrogates to directly update OAC variables (e.g. mid-year population estimates) or by combining the data to develop statistical models. • Mixed effects statistical models were used. i. The available variables were the fixed part of the regression. ii. The spatial hierarchy of the data was the random part of the regression. iii. Both the fixed and random parts were used to make predictions for the period 2002-2010 (empirical best linear unbiased prediction).

Methodology • Out of 22 variables 6 were updated using
surrogates and 16 using statistical models. • For the statistical models 13 dependent variables were percentages (binomial distribution was assumed) and 3 were continuous (normal distribution was assumed). • Overdispersion was a common issue for the binomial models, and an observation-level random intercept was used to deal with it. • The predictions were logarithmically transformed and standardized to match the OAC methodology.

GLMM Example: Routine/Semi-routine occupation logit(π ij )= β 1 +
β 2 x Tax Band A (%) + β 3 x Education Rank + z i + ε ij z i ~ N(0,σ2 LSOA ) ε ij ~ N(0,σ2 ε ) % Tax Band A and Rank of Education Deprivation as explanatory variables.

GLMM Example: Routine/Semi-routine occupation Internal validation using the residuals

Euclidean Distance

Euclidean Distance • The assumption is that OAs at the
margin of a cluster are more likely to be reclassified. • The Euclidean distance could be used as indicator of classification uncertainty.

OAs Temporal Drift

Cluster Stability

OA-Level Temporal Drift Leeds Supergroup 2001 - 2010

OA-Level Temporal Drift Leeds 2001 OAC vs 2010 OAC ~
12% of the OAs changed between 2001 and 2010

Uncertainty Indicators

Sensitivity Analysis

Conclusion • The Euclidean distances can be used to judge
how close to the margin of a cluster an area assignment is. • At the cluster level, data visualisation techniques showed major patterns of change and revealed the stability of the OAC during 2002-2010. • A complete update of the OAC is not yet possible for all of the variables and this could have an effect on the results of the analysis. • Sensitivity analysis also showed the most significant variables for each cluster.

Updating the OAC: Assessing Classification Unce...

Updating the OAC: Assessing Classification Uncertainty and Cluster Stability During Intercensal Periods

Michalis

More Decks by Michalis

Other Decks in Research

Featured

Transcript

Dr Michail Pavlis Dr Alex D Singleton Department of Geography

Output Area Classification • Geodemographic classifications are composite measures describing

OAC – 1st Level of Hierarchy Supergroup 1 (Blue Collar)

Updating the OAC • OAC does not allow for the

Objectives • To evaluate the usefulness of publicly available data

Available Data • Even though the objective was to update

Methodology • Updates could be produced using the data as

Methodology • Out of 22 variables 6 were updated using

GLMM Example: Routine/Semi-routine occupation logit(π ij )= β 1 +

GLMM Example: Routine/Semi-routine occupation Internal validation using the residuals

Euclidean Distance

Euclidean Distance • The assumption is that OAs at the

OAs Temporal Drift

Cluster Stability

OA-Level Temporal Drift Leeds Supergroup 2001 - 2010

OA-Level Temporal Drift Leeds 2001 OAC vs 2010 OAC ~

Uncertainty Indicators

Uncertainty Indicators

Sensitivity Analysis

Conclusion • The Euclidean distances can be used to judge