The Role of Geographical Context in Building Geodemographic Classifications

The Role of Geographical Context in Building Geodemographic Classifications

This is the first part of a wider research regarding the Geographic Sensitivity of socio-spatial patterns, by examining the case studies of Liverpool, Manchester and Leeds, along with some preliminary findings of a more generalised approach of socio-spatial pattern homogeneity. The aim of this wider research is to produce geographic extents that maximize local socio-spatial variation - essentially a MAUP approach.
Presented at the GIS Research UK Conference in Leeds, April 2015 and awarded by CASA-UCL as the best paper in Spatial Analysis.

811dd1a63f4454d9c18650a2201af642?s=128

A. Alexiou

April 16, 2015
Tweet

Transcript

  1. The Role of Geographical Context in Building Geodemographic Classifications 23rd

    GIS Research UK conference, Leeds, April 2015 Alexandros Alexiou Alex Singleton - Dept. of Geography and Planning University of Liverpool
  2. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Summary

     Introduction to Geodemographic Classifications  Research Outline  Methodology and Data  Case studies  Results and Discussion
  3. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Introduction

     A Geodemographic Classification (GC) is a data reduction technique that aims to generate through spatial profiling, clusters of populations that share similarities across multiple socio-economic and build environment attributes.  Their composition differs based on the intended stakeholders’ perspective as well as the skills, experience and available data of the creator.  Webber, 1977: pragmatic strategy; what is deemed to work and what is required, alongside some degree of empirical evaluation.  Among the conventional classification systems :  Proprietary classifications primarily designed to describe consumption patterns. Databases are populated not only with census data but compiled from large consumer databases such as credit checking histories, product registrations and private surveys.  MOSAIC (Experian), ACORN (CACI), P2 People and Places (BD), Claritas (PRiZM) and EuroDirect (CAMEO).  Public/Open Classifications: ONS Output Area Classification (OAC) 2001 and 2011.  Similar products have also been created in academia.
  4. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Introduction

     Geodemographic classifications create a typology that is usually presented as a hierarchy; clusters produce varying tiers of aggregated areas.  Cluster names are described usually through pen portraits. An example from the 2011 OAC:  A top-down approach includes the creation of larger groups that are subsequently divided into smaller sub-groups. E.g. for the 2001 OAC, 7 super-groups split into 21 groups and further into 52 sub-groups.  A bottom-up approach includes the creation of numerous smaller groups, aggregated based on their similarities into larger groups (typically with hierarchical algorithms such as Ward’s clustering criterion).  Common clustering techniques used as classifiers:  K-means clustering  Self-Organizing Maps (SOM)  Fuzzy logic algorithms or “soft” classifiers 1 – Rural residents 5a1 – White professionals 2 – Cosmopolitans 5a – Urban professionals and families 5a2 – Multi-ethnic professionals with families 3 – Ethnicity central 5a3 – Families in terraces and flats 4 – Multicultural metropolitans 5 – Urbanites 6 – Suburbanites 5b1 – Delayed retirement 7 – Constrained city dwellers 5b – Ageing urban living 5b2 – Communal retirement 8 – Hard-pressed living 5b3 – Self-sufficient retirement
  5. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Research

    Outline  Main research question:  Can conventional national classifications be applied locally with satisfactory results?  If so, to what extent? what is the degree of differentiation?  How can this differentiation be measured effectively?  Rationale:  Conventional national classifications may not account for local socio-spatial patterns, increasing the risk of mistargeting when applied locally.  National aggregations sweep away contextual differences between proximal zones.  Researchers without the necessary expertise may find it difficult to produce specific- purpose GCs ad hoc. General-purpose classifications are more convenient to use.  Such debate is long withstanding, originating in the earliest of UK classifications (see Openshaw, Cullingford and Gillard, 1980 and Webber, 1980).
  6. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Methodology

    and Data  This research uses a set of fixed input attributes for Output Area zonal geography to build classifications with different geographic context.  For this purpose, a number of geographic contexts are considered (local, regional, national) to demonstrate the impact on final classification outcome when input variables are kept constant.  In order to demonstrate how much output classifications differ, we perform an analysis of the sets of classifications for Liverpool, Manchester and Leeds.  Creation:  Initial 60+ Census 2011 Variables from Demographic, Housing and Economic Activity attributes.  Output Area aggregation level for England (>170.000 neighbourhoods).  K-Means Clustering (Hartigan & Wong, 1979), single hierarchy (Supergroup Level).  Analysis carried out using the R software.
  7. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Methodology

    and Data  K-Means Input Dataset  Variable formatting:  “Unfit” data: Variable distribution and correlation checks.  Normalisation using Box-Cox Transformation: Obtaining ratios per areal unit Percentages where xa,i is the attribute value i of area a and Pa is the population of reference (denominator) of area a, i.e. total population, number of households, etc. Standardised by group where xa,i is the attribute value i of area a, rN,g is the observed national ratio N for group g and Pa,i is the population of group g in area a. Normalisation Transformation Box – Cox The power λ achieves the best normalization and can be estimated algorithmically. Variable Scaling Z-Score Scaling where xa,i is the attribute value i of area a, μS is the mean and σS is the standard deviation of the set of observations S.  Standardisation (for all three geographic scales seperately):
  8. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Methodology

    and Data Final Dataset with Variable Definition: 2011 Census (ONS) Demographic V1 Age0_4 Percentage of resident population aged 0–4 years V2 Age5_14 Percentage of resident population aged 5–14 years V3 Age15_24 Percentage of resident population aged 15-24 years V4 Age45_64 Percentage of resident population aged 45–64 years V5 Age65_ Percentage of resident population aged 65 or more years V6 Eth_Arab Percentage of people identifying as Arab V7 Eth_Black Percentage of people identifying as black African, black Caribbean or other black V8 Eth_Asian Percentage of people identifying as Indian, Pakistani, Bangladeshi, Chinese or Other Asian V9 Mar_Single Percentage of population over 16 years who are single Housing V10 Density Number of people per hectare V11 Ten_Rent Percentage of households that are private sector rented accommodation V12 Ten_Social Percentage of households that are public sector rented accommodation V13 House_Share Percentages of households that are shared accommodation V14 House_Flat Percentage of households which are flats V15 CeH_No Percentage of occupied household spaces without central heating Economic Activity V16 EA_Part Percentage of household representatives who are working part-time V17 EA_Unemp Percentage of household representatives who are unemployed V18 EA_Stud Percentage of household representatives who are students V19 Edu_Low Percentage of people over 16 years with some qualifications but not a HE qualification V20 Edu_HE Percentage of people over 16 years for which the highest level of qualification is level 4 qualifications and above V21 NS_Manager Percentage of household reference persons in higher managerial, administrative and professional occupations V22 NS_Semi Percentage of household reference persons in intermediate occupations V23 Ind_Agr Percentage of population aged 16-74 who work in the A, B and C industry sector V24 Ind_Man Percentage of population aged 16-74 who work in the D, E and F industry sector V25 Ind_Sales Percentage of population aged 16-74 who work in the G, H and I industry sector V26 Ind_Tech Percentage of population aged 16-74 who work in the K, L and M industry sector V27 Ind_Adm Percentage of population aged 16-74 who work in the N, O, P, Q, T, and U industry sector V28 Ind_Art Percentage of population aged 16-74 who work in the R and S industry sector Travel behavior V29 Car_0 Percentage of households with no car V30 Car_1 Percentage of households with 1 car V31 Car_3 Percentage of households with 3 or more cars V32 Tr_Public Percentage of population aged 16-74 who travel to work by public transport V33 Tr_Foot Percentage of population aged 16-74 who travel to work on foot or by bicycle
  9. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Methodology

    and Data  Currently there is no best practice to compare two different sets of classifications in order to find “best fits” between clusters (cluster IDs are assigned randomly):  Even if they derive from the same observations set S, a classification for a set of local observations L compared with a national classification derived form S will produce dissimilar cluster assignments.  Two sources of cluster assignment variance:  Standardisation (for different geographical contexts, the mean μ and standard deviation σ changes)  Clustering process  We explore and illustrate the variation with a number of methods: 1. Plotting the Cluster Mean Centres (attribute means) so we can assess the nature of the cluster (pen-portraits). 2. Contingency Tables: cross-tabulating the cluster distribution frequencies. 3. Mapping our results.
  10. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Case

    Studies  We compare 3 sets of classifications, one set for each case study, that were built using the same data set:  We compare outcomes based on k-means algorithm for 7 clusters: 1. Radial plots to assess “attribute fit”. 2. Cross-tabulation to assess “geographic fit”. Geographic area Local Classification Regional Classification National Classification Liverpool Liverpool Local Authority North West England Manchester Greater Manchester Area North West England Leeds Leeds Local Authority Yorkshire and the Humber England
  11. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Case

    Studies  Constructing Pen Portraits
  12. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Case

    Studies - Liverpool Liverpool Cluster Name OA Amount NW Cluster NW OA Amount Cluster Similarity Urban Professionals 332 3 203 61% Retired Communities 185 2 0 0% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 134 78% Suburban Living 306 4 52 17% Hard-Pressed Families 381 6 352 92% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 858 54.2% Liverpool Cluster Name OA Amount National Cluster National OA Amount Cluster Similarity Urban Professionals 332 3 214 64% Retired Communities 185 2 9 5% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 126 74% Suburban Living 306 4 103 34% Hard-Pressed Families 381 6 381 100% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 950 60.0%  Cross-Tabulation vs. Radial Plots
  13. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Case

    Studies - Liverpool Liverpool Cluster Name OA Amount NW Cluster NW OA Amount Cluster Similarity Urban Professionals 332 3 203 61% Retired Communities 185 2 0 0% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 134 78% Suburban Living 306 4 52 17% Hard-Pressed Families 381 6 352 92% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 858 54.2% Liverpool Cluster Name OA Amount National Cluster National OA Amount Cluster Similarity Urban Professionals 332 3 214 64% Retired Communities 185 2 9 5% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 126 74% Suburban Living 306 4 103 34% Hard-Pressed Families 381 6 381 100% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 950 60.0%  Cross-Tabulation vs. Radial Plots
  14. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Case

    Studies - Liverpool Liverpool Cluster Name OA Amount NW Cluster NW OA Amount Cluster Similarity Urban Professionals 332 7 203 61% Retired Communities 185 3 0 0% Student Living 81 1 81 100% Striving Ethnic Workers 171 5 134 78% Suburban Living 306 6 52 17% Hard-Pressed Families 381 4 352 92% Young Cosmopolitans 128 2 36 28% Sum / Mean 1584 858 54.2% Liverpool Cluster Name OA Amount National Cluster Nat. OA Amount Cluster Similarity Urban Professionals 332 3 214 64% Retired Communities 185 2 9 5% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 126 74% Suburban Living 306 4 103 34% Hard-Pressed Families 381 6 381 100% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 950 60.0%  Cross-Tabulation vs. Radial Plots
  15. None
  16. None
  17. None
  18. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES -1.5

    -1 -0.5 0 0.5 1 1.5 2 2.5 Age0_4 Age5_14 Age15_24 Age45_64 Age65_ Car_0 Car_1 Car_3 CeH_No Density EA_Part EA_Unemp EA_Stud Eth_Asian Eth_Black Eth_Arab Edu_Low Edu_HE House_Flat NS_Manager NS_Semi Ten_Rent Ten_Social Mar_Single Mar_Married Tr_Public Tr_Foot Ind_Agr Ind_Man Ind_Sales Ind_Tech Ind_Adm Ind_Art Asian Communities Case Studies – G. Manchester G. Manchester Cluster Name OA Amount NW Cluster NW OA Amount Cluster Similarity Urban Professionals 2255 G 1 0.0% Asian Communities 546 Retired Communities 1 0.2% Student Living 360 A 359 99.7% Striving Ethnic Workers 864 E 724 83.8% Suburban Living 2202 F 945 42.9% Hard-Pressed Families 1638 D 1389 84.8% Young Cosmopolitans 819 B 764 93.3% Sum / Mean 8684 4183 48.2% G. Manchester Cluster Name OA Amount National Cluster Nat. OA Amount Cluster Similarity Urban Professionals 2255 B 1398 62.0% Asian Communities 546 Retired Communities 0 0.0% Student Living 360 G 287 79.7% Striving Ethnic Workers 864 F 547 63.3% Suburban Living 2202 E 1189 54.0% Hard-Pressed Families 1638 A 1614 98.5% Young Cosmopolitans 819 D 293 35.8% Sum / Mean 8684 5328 61.4%  Cross-Tabulation vs. Radial Plots
  19. None
  20. None
  21. None
  22. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Case

    Studies - Leeds Leeds Cluster Name OA Amount YH Cluster YH OA Amount Cluster Similarity Urban Professionals 682 C 461 67.6% Young & Single “Techies” 112 Retired Communities 0 0.0% Student Living 116 G 116 100.0% Striving Ethnic Workers 373 D 352 94.4% Suburban Living 340 E 300 88.2% Hard-Pressed Families 569 A 301 52.9% Young Cosmopolitans 351 B 340 96.9% Sum / Mean 2543 1870 73.5% Leeds Cluster Name OA Amount National Cluster Nat. OA Amount Cluster Similarity Urban Professionals 682 G 342 50.1% Young & Single "Techies" 112 Retired Communities 0 0.0% Student Living 116 D 115 99.1% Striving Ethnic Workers 373 F 253 67.8% Suburban Living 340 B 298 87.6% Hard-Pressed Families 569 E 470 82.6% Young Cosmopolitans 351 A 121 34.5% Sum / Mean 2543 1599 62.9%  Cross-Tabulation vs. Radial Plots
  23. None
  24. None
  25. None
  26. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES 

    Geographic Sensitivity of geodemographic classifications is very difficult to assess, given the complexity of the problem. Some remarks: Results and Discussion -1.5 -1 -0.5 0 0.5 1 1.5 Age0_4 Age5_14 Age15_24 Age45_64 Age65_ Car_0 Car_1 Car_3 CeH_No Density EA_Part EA_Unemp EA_Stud Eth_Asian Eth_Black Eth_Arab Edu_Low Edu_HE House_Flat NS_Manager NS_Semi Ten_Rent Ten_Social Mar_Single Mar_Married Tr_Public Tr_Foot Ind_Agr Ind_Man Ind_Sales Ind_Tech Ind_Adm Ind_Art Cluster Comparison - Hard-Pressed Households Liverpool Manchester Leeds  The notions of attribute fit and geographic fit are central to comparisons.  Attribute means do provide a basis for correlation between cluster pairs, however they do not account for the magnitude of deviation of the OA attribute values from the mean.  Between geographic scales, formed clusters can be completely different in nature, making comparisons inconclusive.  Policy implications:  In-between classification comparisons: Small differentiation in attributes can demonstrate central tendencies of the local populations.  However actual socio-spatial patterns can in fact be very different.  When assessing spatial policies, upper hierarchies (i.e. Supergroup Level) from national classifications may not be suitable as they can produce misleading results.
  27. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES 

    Methodological Implications:  Standardising attributes directly affects cluster formation. Clusters at national scales appear more homogenous due to reduced absolutes distances.  I.e. for k = 7, the total variation lost (smoothed out) has a magnitude of ~ 9%.  A key research should focus on whether there are specific geographical contexts that maximise clustering efficiency to local variation, and how unique clusters can be handled.  Administrative boundaries do not necessarily reflect the actual organisation of communities.  For instance calculating geographic boundaries in non-Euclidian space. Results and Discussion
  28. 23rd GISRUK, Leeds, April 2015 SCHOOL OF ENVIRONMENTAL SCIENCES Results

    and Discussion  Future research and preliminary results (benchmark geographic boundaries)  We use the angular similarity  Benchmark results:  LA (Local Authority) Classification vs. National Classification.  Standardised attributes per LA.  The aim is to produce geographic boundaries that maximize local efficiency, other than the arbitrary administrative boundaries.  Such boundaries can be used in any research regarding population dynamics (e.g. retail analysis) and can be made publicly available easily. measure to compare cluster attribute means:
  29. Thank you for your time a.alexiou@liv.ac.uk https://speakerdeck.com/dblalex Acknowledgements: This work

    is funded as part of an ESRC PhD studentship and in collaboration with the Office for National Statistics North West Doctoral Training Centre