Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Some developments in the specification, estimat...

Some developments in the specification, estimation and testing of geodemographic classifications

13th April 2009, Kyoto, Ritsumeikan University

alexsingleton

April 13, 2009
Tweet

More Decks by alexsingleton

Other Decks in Education

Transcript

  1. SOME DEVELOPMENTS IN THE SPECIFICATION, ESTIMATION AND TESTING OF GEODEMOGRAPHIC

    CLASSIFICATIONS Dr Alex D Singleton Department of Geography and Centre for Advance Spatial Analysis , University College London www.alex-singleton.com
  2. 2006 DEFINITION OF A PLANET • After much debate the

    International Astronomers Union (IAU) agreed… 1. The object must be in orbit around the Sun. 2. The object must be massive enough to be a sphere by its own gravitational force. 3. It must have cleared the neighbourhood around its orbit.
  3. FORMAL CLASSIFICATION 3 Main Divisions (Kingdoms) 1) Minerals 2) Vegetables

    3) Animals Carl von Linné 17th Century Swedish Botanist
  4. BLACK: Lowest class. Vicious, semi-criminal. DARK BLUE: Very poor, casual.

    Chronic want. LIGHT BLUE: Poor. 18s. to 21s. a week for a moderate family PURPLE: Mixed. Some comfortable others poor PINK: Fairly comfortable. Good ordinary earnings. RED: Middle class. Well-to-do. YELLOW: Upper-middle and Upper classes. Wealthy Charles Booth – Booth Map London 1889
  5. WHY DO WE CLASSIFY? • Because the world is complex

    • Need to simplify reality • Improve our understanding and help us make decisions • Help us to navigate the world • No representation is correct!
  6. WHAT ARE GEODEMOGRAPHICS? “Analysis of people by where they live”

    or “locality marketing” (Sleight, 1993:3) Person Home Address “Area”
  7. HISTORY GEODEMOGRAPHICS? • Developed in 1970’s - attributed to Richard

    Webber • Target urban deprivation funding : Identify clusters of similar neighbourhoods • Moved to CACI: Linked ED to Postcode • ACORN (Private Sector) • Moved to Experian • MOSAIC (Private sector)
  8. CREATING A GEODEMOGRAPHIC 46% 54% Census Non Census 70% 30%

    Census Non Census Experian: Mosaic CACI: Acorn Census and Non Census
  9. CREATING A GEODEMOGRAPHIC • Age • Ethnicity • Country of

    Birth • Population • Living Arrangements • Family Size • Tenure • House Type / Size • House Quality • House Ownership • Health of Population • Employment • Industry Sectors Census Only http://www.areaclassification.org.uk/
  10. SEGMENTATIONS ARE CREATED BY CLUSTER ANALYSIS Area V1 V2 V3

    V4 V5 V6 V7 V8 V9 V10 ... Area1 Area2 Area3 Area4 Area5 Area6 Area7 Area8 ...
  11. Area Cluster Area1 1 Area2 1 Area3 2 Area4 1

    Area5 3 Area6 3 Area7 3 Area8 2 ... OUTPUT
  12. OUTPUT AREA CLASSIFICATION • 3 Levels – 52 Sub Groups

    – 21 Groups – 7 Super Groups • Open Methodology – Only classification with status of a National Statistic • Peer Reviewed – Vickers, D.W. and Rees, P.H. (2007). Creating the National Statistics 2001 Output Area Classification. Journal of the Royal Statistical Society, Series A • Free! • RSS active user group
  13. 1: Blue Collar Communitie s 1a: Terraced Blue Collar 1a1:

    Terraced Blue Collar (1) 1a2: Terraced Blue Collar (2) 1a3: Terraced Blue Collar (3) 1b: Younger Blue Collar 1b1: Younger Blue Collar (1) 1b2: Younger Blue Collar (2) 1c: Older Blue Collar 1c1: Older Blue Collar (1) 1c2: Older Blue Collar (2) 1c3: Older Blue Collar (3) 2: City Living 2a: Transient Communities 2a1: Transient Communities (1) 2a2: Transient Communities (2) 2b: Settled in the City 2b1: Settled in the City (1) 2b2: Settled in the City (2) 3: Countryside 3a: Village Life 3a1: Village Life (1) 3a2: Village Life (2) 3b: Agricultural 3b1: Agricultural (1) 3b2: Agricultural (2) 3c: Accessible Countryside 3c1: Accessible Countryside 3c2: Accessible Countryside
  14. These neighbourhoods are predominantly located in inner cities and are

    characterised by low quality, high density rented flats. They have multi-ethnic populations, many of whom are first generation immigrants. Other residents include students.
  15. MOSAIC • 2 Levels • 61 Types • 11 Groups

    • Closed Methodology • Unit Postcode • Commercial • Just data or analysis tools
  16. Urban Intelligence Young, single and mostly well educated, these people

    are cosmopolitan in tastes and liberal in attitudes. Group 5 / 11
  17. Symbols of Success Happy Families Suburban Comfort Ties of Community

    Urban Intelligence Welfare Borderline Municipal Dependency Blue Collar Enterprise Twilight Subsistence Grey Perspectives Rural Isolation Cheltenham, Gloucestershire, UK.
  18. WHAT DO WE TAKE FROM THIS? • Not all segmentations

    are the same… • OAC says society is made up of 52 different types of areas? • MOSAIC says society is made up of 61 different types of areas? • Which is right? • Commercial Providers will more than likely say: Ours is!
  19. RECENT “INNOVATIONS” • Commercial innovation - diversification / new markets

    • Region: Global, London, Scotland, Northern Ireland, Japan, Italy... • New Industries: Automotive, Financial, Grocery, Public Sector, Health • New Segments: Daytime, Names, Income, Street Values, Shareholder Activity, Unemployment, Individuals (Scale)
  20. “RECENT” INNOVATIONS • Long history of academic critique • Fuzzy

    Geodemographics (Feng & Flowerdew, 1998; See & Openshaw, 2001) • Description V Modelling (Harris et al, 2007) • Bespoke Classifications (Longley and Singleton, 2009a) • Public Consultation (Longley and Singleton, 2009b) • Utility of public consultation data (Singleton, 2008)
  21. “RECENT” INNOVATIONS • Need for real time geodemographics: • Current

    classifications are created using static data sources. • Rate and scale of current population change is making large surveys (census) increasingly redundant. • Significant hidden value in transactional data • Data is increasingly available in near real time • Application specific (bespoke) classifications have demonstrated utility.
  22. REALTIME FEEDS OF DATA • Involve integration of large and

    possibly disparate databases • E.g. Doctor registrations; census data • Common protocol • XML: E.g. UK Neighbourhood Statistics API • Is there any value in other non- traditional data sources? • E.g. Flickr
  23. ONLINE SPECIFICATION INPUTS • Usability • Expert V Non-Expert Users

    • Non-Expert Users • Pre selection of variables and weighting for specific application • Expert Users • Selection of any variable and any weighting
  24. CLUSTERING • K-Means algorithm: Unstable - initial start conditions effect

    the results • Measured within sum of squares or R2 K=4
  25. CLUSTERING • Alternate algorithms? • PAM (Partitioning around medoids) tries

    to minimize the sum of distances of the objects to their cluster centers. • CLARA draws multiple samples of the dataset, applies PAM to each sample and returns the best result. • GA (Genetic Algorithm) is inspired by models of biological evolution. Produce results through a breeding procedure.
  26. CLUSTERING K-means result for 41 “OAC variables” K-means result for

    26 OAC Principle Components 99% Similar K=4 or... refine k-means
  27. STATE OF THE ART • Slowly moving beyond: • Idea

    of expert producers • General purpose classifications • There is only one correct representation • Creating classifications which are • Responsive to changes in local populations • Fit for purpose (bespoke classifications) • Open to scrutiny and verifiable by the public