Slide 1

Slide 1 text

SOME DEVELOPMENTS IN THE SPECIFICATION, ESTIMATION AND TESTING OF GEODEMOGRAPHIC CLASSIFICATIONS Dr Alex D Singleton Department of Geography and Centre for Advance Spatial Analysis , University College London www.alex-singleton.com

Slide 2

Slide 2 text

WHAT IS CLASSIFICATION? How can this help us to understand the world?

Slide 3

Slide 3 text

Space Earth Land Water Pluto

Slide 4

Slide 4 text

2006 DEFINITION OF A PLANET • After much debate the International Astronomers Union (IAU) agreed… 1. The object must be in orbit around the Sun. 2. The object must be massive enough to be a sphere by its own gravitational force. 3. It must have cleared the neighbourhood around its orbit.

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

World Continents Countries

Slide 7

Slide 7 text

Road Sidewalk Shops Music Shops Food Shops Man Woman

Slide 8

Slide 8 text

FORMAL CLASSIFICATION 3 Main Divisions (Kingdoms) 1) Minerals 2) Vegetables 3) Animals Carl von Linné 17th Century Swedish Botanist

Slide 9

Slide 9 text

BLACK: Lowest class. Vicious, semi-criminal. DARK BLUE: Very poor, casual. Chronic want. LIGHT BLUE: Poor. 18s. to 21s. a week for a moderate family PURPLE: Mixed. Some comfortable others poor PINK: Fairly comfortable. Good ordinary earnings. RED: Middle class. Well-to-do. YELLOW: Upper-middle and Upper classes. Wealthy Charles Booth – Booth Map London 1889

Slide 10

Slide 10 text

Marr Map, Manchester, 1904

Slide 11

Slide 11 text

URBAN LAND USE MODELS c.1925 c.1939

Slide 12

Slide 12 text

WHY DO WE CLASSIFY? • Because the world is complex • Need to simplify reality • Improve our understanding and help us make decisions • Help us to navigate the world • No representation is correct!

Slide 13

Slide 13 text

WHAT ARE GEODEMOGRAPHICS? “Analysis of people by where they live” or “locality marketing” (Sleight, 1993:3) Person Home Address “Area”

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

London Terraces Blue Collar Council Flat Central Districts

Slide 17

Slide 17 text

HISTORY GEODEMOGRAPHICS? • Developed in 1970’s - attributed to Richard Webber • Target urban deprivation funding : Identify clusters of similar neighbourhoods • Moved to CACI: Linked ED to Postcode • ACORN (Private Sector) • Moved to Experian • MOSAIC (Private sector)

Slide 18

Slide 18 text

CREATING A GEODEMOGRAPHIC 46% 54% Census Non Census 70% 30% Census Non Census Experian: Mosaic CACI: Acorn Census and Non Census

Slide 19

Slide 19 text

CREATING A GEODEMOGRAPHIC • Age • Ethnicity • Country of Birth • Population • Living Arrangements • Family Size • Tenure • House Type / Size • House Quality • House Ownership • Health of Population • Employment • Industry Sectors Census Only http://www.areaclassification.org.uk/

Slide 20

Slide 20 text

SEGMENTATIONS ARE CREATED BY CLUSTER ANALYSIS Area V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ... Area1 Area2 Area3 Area4 Area5 Area6 Area7 Area8 ...

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Variable 1 Variable 2 Cluster 1 Cluster 2 Cluster 3 CLUSTER ANALYSIS

Slide 23

Slide 23 text

Area Cluster Area1 1 Area2 1 Area3 2 Area4 1 Area5 3 Area6 3 Area7 3 Area8 2 ... OUTPUT

Slide 24

Slide 24 text

OUTPUT AREA CLASSIFICATION • 3 Levels – 52 Sub Groups – 21 Groups – 7 Super Groups • Open Methodology – Only classification with status of a National Statistic • Peer Reviewed – Vickers, D.W. and Rees, P.H. (2007). Creating the National Statistics 2001 Output Area Classification. Journal of the Royal Statistical Society, Series A • Free! • RSS active user group

Slide 25

Slide 25 text

1: Blue Collar Communitie s 1a: Terraced Blue Collar 1a1: Terraced Blue Collar (1) 1a2: Terraced Blue Collar (2) 1a3: Terraced Blue Collar (3) 1b: Younger Blue Collar 1b1: Younger Blue Collar (1) 1b2: Younger Blue Collar (2) 1c: Older Blue Collar 1c1: Older Blue Collar (1) 1c2: Older Blue Collar (2) 1c3: Older Blue Collar (3) 2: City Living 2a: Transient Communities 2a1: Transient Communities (1) 2a2: Transient Communities (2) 2b: Settled in the City 2b1: Settled in the City (1) 2b2: Settled in the City (2) 3: Countryside 3a: Village Life 3a1: Village Life (1) 3a2: Village Life (2) 3b: Agricultural 3b1: Agricultural (1) 3b2: Agricultural (2) 3c: Accessible Countryside 3c1: Accessible Countryside 3c2: Accessible Countryside

Slide 26

Slide 26 text

These neighbourhoods are predominantly located in inner cities and are characterised by low quality, high density rented flats. They have multi-ethnic populations, many of whom are first generation immigrants. Other residents include students.

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

MOSAIC • 2 Levels • 61 Types • 11 Groups • Closed Methodology • Unit Postcode • Commercial • Just data or analysis tools

Slide 29

Slide 29 text

Urban Intelligence Young, single and mostly well educated, these people are cosmopolitan in tastes and liberal in attitudes. Group 5 / 11

Slide 30

Slide 30 text

Symbols of Success Happy Families Suburban Comfort Ties of Community Urban Intelligence Welfare Borderline Municipal Dependency Blue Collar Enterprise Twilight Subsistence Grey Perspectives Rural Isolation Cheltenham, Gloucestershire, UK.

Slide 31

Slide 31 text

WHAT DO WE TAKE FROM THIS? • Not all segmentations are the same… • OAC says society is made up of 52 different types of areas? • MOSAIC says society is made up of 61 different types of areas? • Which is right? • Commercial Providers will more than likely say: Ours is!

Slide 32

Slide 32 text

RECENT “INNOVATIONS” • Commercial innovation - diversification / new markets • Region: Global, London, Scotland, Northern Ireland, Japan, Italy... • New Industries: Automotive, Financial, Grocery, Public Sector, Health • New Segments: Daytime, Names, Income, Street Values, Shareholder Activity, Unemployment, Individuals (Scale)

Slide 33

Slide 33 text

http://www.mosaicjapan.com/

Slide 34

Slide 34 text

“RECENT” INNOVATIONS • Long history of academic critique • Fuzzy Geodemographics (Feng & Flowerdew, 1998; See & Openshaw, 2001) • Description V Modelling (Harris et al, 2007) • Bespoke Classifications (Longley and Singleton, 2009a) • Public Consultation (Longley and Singleton, 2009b) • Utility of public consultation data (Singleton, 2008)

Slide 35

Slide 35 text

“RECENT” INNOVATIONS • Need for real time geodemographics: • Current classifications are created using static data sources. • Rate and scale of current population change is making large surveys (census) increasingly redundant. • Significant hidden value in transactional data • Data is increasingly available in near real time • Application specific (bespoke) classifications have demonstrated utility.

Slide 36

Slide 36 text

Specification Estimation Testing REALTIME GEODEMOGRAPHICS

Slide 37

Slide 37 text

REALTIME FEEDS OF DATA • Involve integration of large and possibly disparate databases • E.g. Doctor registrations; census data • Common protocol • XML: E.g. UK Neighbourhood Statistics API • Is there any value in other non- traditional data sources? • E.g. Flickr

Slide 38

Slide 38 text

What information can be extracted?

Slide 39

Slide 39 text

ONLINE SPECIFICATION INPUTS • Usability • Expert V Non-Expert Users • Non-Expert Users • Pre selection of variables and weighting for specific application • Expert Users • Selection of any variable and any weighting

Slide 40

Slide 40 text

CLUSTERING • K-Means algorithm: Unstable - initial start conditions effect the results • Measured within sum of squares or R2 K=4

Slide 41

Slide 41 text

CLUSTERING • Alternate algorithms? • PAM (Partitioning around medoids) tries to minimize the sum of distances of the objects to their cluster centers. • CLARA draws multiple samples of the dataset, applies PAM to each sample and returns the best result. • GA (Genetic Algorithm) is inspired by models of biological evolution. Produce results through a breeding procedure.

Slide 42

Slide 42 text

CLUSTERING K-means result for 41 “OAC variables” K-means result for 26 OAC Principle Components 99% Similar K=4 or... refine k-means

Slide 43

Slide 43 text

VISUALISATION

Slide 44

Slide 44 text

VISUALISATION

Slide 45

Slide 45 text

STATE OF THE ART • Slowly moving beyond: • Idea of expert producers • General purpose classifications • There is only one correct representation • Creating classifications which are • Responsive to changes in local populations • Fit for purpose (bespoke classifications) • Open to scrutiny and verifiable by the public

Slide 46

Slide 46 text

SO WHAT DOES A FUTURE GEODEMOGRAPHIC LOOK LIKE?

Slide 47

Slide 47 text

Thanks: Any Questions?