Upgrade to Pro — share decks privately, control downloads, hide ads and more …

39th RSAI:BIS, 2-4 September 2009, Limerick, Ir...

alexsingleton
September 15, 2009

39th RSAI:BIS, 2-4 September 2009, Limerick, Ireland

This presentation was given at the OAC User Group visualisation event at the RSS, 15th Sept 2009. For more details see - www.areaclassification.org.uk

alexsingleton

September 15, 2009
Tweet

More Decks by alexsingleton

Other Decks in Education

Transcript

  1. MOVING TOWARDS REAL-TIME GEODEMOGRAPHIC SEGMENTATION Dr Alex D Singleton Department

    of Geography and Centre for Advance Spatial Analysis , University College London www.alex-singleton.com
  2. SEGMENTATIONS ARE CREATED BY CLUSTER ANALYSIS Area V1 V2 V3

    V4 V5 V6 V7 V8 V9 V10 ... Area1 Area2 Area3 Area4 Area5 Area6 Area7 Area8 ...
  3. Area Cluster Area1 1 Area2 1 Area3 2 Area4 1

    Area5 3 Area6 3 Area7 3 Area8 2 ... OUTPUT
  4. NEED FOR REAL TIME GEODEMOGRAPHICS • Current classifications are created

    using static data sources. • Rate and scale of current population change is making large surveys (census) increasingly redundant. • Significant hidden value in transactional data • Data is increasingly available in near real time • Application specific (bespoke) classifications have demonstrated utility.
  5. REALTIME FEEDS OF DATA • Involve integration of large and

    possibly disparate databases • Common protocol • XML: E.g. UK Neighbourhood Statistics API • Formal • E.g. Doctor registrations; HE Data; Census data • Informal • Is there any value in other non- traditional data sources?
  6. ONLINE SPECIFICATION INPUTS • Usability • Expert V Non-Expert Users

    • Non-Expert Users • Pre selection of variables and weighting for specific application • Expert Users • Selection of any variable and any weighting
  7. CLUSTERING • K-Means algorithm: Unstable - initial start conditions effect

    the results • Measured within sum of squares or R2 K=4
  8. CLUSTERING • Alternate algorithms? • PAM (Partitioning around medoids) tries

    to minimize the sum of distances of the objects to their cluster centers. • CLARA draws multiple samples of the dataset, applies PAM to each sample and returns the best result. • GA (Genetic Algorithm) is inspired by models of biological evolution. Produce results through a breeding procedure.
  9. CLUSTERING K-means result for 41 “OAC variables” K-means result for

    26 OAC Principle Components ~99% Similar K=4 or... refine k-means
  10. STATE OF THE ART • Slowly moving beyond: • Idea

    of expert producers • General purpose classifications • There is only one correct representation • Creating classifications which are • Responsive to changes in local populations • Fit for purpose (bespoke classifications) • Open to scrutiny and verifiable by the public