PopChange - Creating access to UK wide census data

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Improving Access to
UK-wide Census Data Nick Bearman Clear Mapping Co University of Liverpool @nickbearmanuk

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Why is it
important to make Census data more accessible? (& how) http://www.telegraph.co.uk/news/uknews/8371197/Missing-questions-on-2011-Census-baffle-public.html

Dr Nick Bearman | @nickbearmanuk | 18/11/16 The Census provides
a whole range of very useful data Hours of unpaid care http://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/articles/2011censusanalysisunpaidcareinenglandandwale s2011andcomparisonwith2001/2013-02-15

Dr Nick Bearman | @nickbearmanuk | 18/11/16 http://www.neighbourhood.statistics.gov.uk/HTMLDocs/dvc128/wrapper.html The Census
provides a whole range of very useful data Hours of unpaid care http://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/articles/2011censusanalysisunpaidcareinenglandandwale s2011andcomparisonwith2001/2013-02-15

Dr Nick Bearman | @nickbearmanuk | 18/11/16 The Census provides
a whole range of very useful data Google Image Search – Census outputs UK: https://www.google.com/search? safe=off&site=&tbm=isch&source=hp&biw=1265&bih=918&q=census+questionnaire+uk&oq=census+questionnaire+uk

Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data
is hard to access https://photosleuth.wordpress.com/category/derbyshire/page/3/

is hard to access Current Data – Casweb: UK, 1971 - 2001

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Infuse – 2001
and 2011, England and Wales only

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Scotland 2001 &
2011

is hard to access

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Archive Data ED
boundaries for 1971 not available (only centroids) ESRC-funded project to extract 1971, 1981 and 1991 data from a COBOL database; all data now available for 1971-2011 Pre 1971 data not available digitally systematically https://www.flickr.com/photos/woolamaloo_gazette/6238597508/

Dr Nick Bearman | @nickbearmanuk | 18/11/16 We wanted to
make Census data easier to access Particularly to look at how areas change over time If the data is easier to access, more people will use it What’s the problem?

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Census data is
made available in Output Areas ~ 100 households Comparing these small areas over time is difficult, as they change We wanted to make Census data easier to access

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Census data is
made available in Output Areas ~ 100 households Comparing these small areas over time is difficult, as they change We wanted to make Census data easier to access NW Swindon Jan 2004 NW Swindon Jan 2011 http://www.ordnancesurvey.co.uk/blog/2011/04/3974/

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Blue = 1991
Enumeration districts; Red = 2011 Output Areas

Dr Nick Bearman | @nickbearmanuk | 18/11/16 So these comparisons
can be tricky to do Particularly if you want to go from 1971/81/91 (enumeration districts) to 2001/11 (output areas) Blue = 1991 Enumeration districts; Red = 2011 Output Areas

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Who is interested
in these comparisons?

in these comparisons? https://www.ons.gov.uk/census/2011census/2011censusbenefits/howothersusecensusdata

in these comparisons? Sector % N = 76 Academic study / research 64% 49 Schools 4% 3 Central government 4% 3 Private sector 7% 5 Third sector 3% 2 Personal use 5% 4 Local government 13% 10 users, 20161115, n = 76

Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we
achieve this, and make Census data more useful?

achieve this, and make Census data more useful? • Converted most variables for all years • To a 1km grid across Great Britain

achieve this, and make Census data more useful? We also: • Created an online resource to do comparisons

Dr Nick Bearman | @nickbearmanuk | 18/11/16

achieve this, and make Census data more useful? We also: • Created an online resource to do comparisons • Limited the comparisons & variables

Dr Nick Bearman | @nickbearmanuk | 18/11/16 2001 2011 Fairly
Good Good Not Good Good Fair Bad OK Use Caution Bad

achieve this, and make Census data more useful? We also: • Created an online resource to do comparisons • Limited the comparisons & variables • Simple interface and output • But option to get data if you wish

achieve this, and make Census data more useful? With Census data that is easier to use, we hope: • More people can make use of the data • More people can look at change over time for small areas • We can highlight how useful GIS can be • We can how how important ease of use data is

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Moving on from
‘Why?’ to ‘How?’ Academic Technical

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – How
did we get from EDs/OAs to 1km grid? Postcode centroids, Output Areas and 1km grid cells

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – Using
Postcode Density 1. Generate a postcode intensity grid using kernel estimation – 1km cells 2. Overlay 1 with source zones (e.g., Output Areas (OAs) giving OAG) 3. Compute populations (OAG_Estimate) for each OAG zone with: • WtArea = Wt  OAG_Area; • WtAreaSum = WtArea summed by OA; • OAG_Estimate = WtArea / WtAreaSum  OAPop 4. Aggregate OAG_Estimate values by grid cell Population is then allocated to 1km grids, based on postcode densities (i.e. more postcodes -> more people)

Dr Nick Bearman | @nickbearmanuk | 18/11/16 This then allows
us to compare variables over time using the 1km gird - For a wide range of variables - For 2011, 2001, 1991, 1981 and 1971 http://gis.stackexchange.com/questions/20127/creating-a-raster-of-the- residuals-of-a-regression-between-two-rasters Academic – Using Postcode Density

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – Did
it work? • There was a gridded output for 1971 based on counts, but this is now missing • 1981 onwards there is no gridded output for GB, which is why we need this recourse • There is a gridded resource for Northern Ireland for 1971 – 2011, so we can compare

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – Did
it work? Generate 1km grids from Small Area (SA) data using postcode centroids to determine variations in population density within SAs. Use NI Census Grid Square resource (available since 1971) to assess accuracy of estimates for grid cells. NI total population: 1,810,863 Small Areas: n 4537 Minimum 98 Maximum 3075 Mean 399.13

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Total persons /
HA by Small Area

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Total persons by
1km grid cells

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Estimated total persons
by 1km grid cells (>= 1 person)

by 1km grid cells (>= 25 persons)

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Estimated – observed
total persons by 1km grid cells

by 1km grid cells Total persons / HA by Output Area

by 1km grid cells (>= 25 persons) Estimated total persons by 1km grid cells (>= 1 person)

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Refinement to surface
modelling procedure • Once the grids are generated, a further stage is to smooth cell values • If this is not done some cells within larger source zones will have identical values and this is not desirable • The amount of smoothing will be different for some variables than for others; spatially more ‘noisy’ variables such as LLTI will change more (proportionately) after smoothing than those which are more continuous (e.g., ethnicity) • All grids are smoothed using a 3 by 3 cell smoothing filter so that adjacent cells completely within source zones have different values (but the sum of the grouped cells remains the same so the total population is unchanged)

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Technical – how
did we create the grids and the online interface?

Dr Nick Bearman | @nickbearmanuk | 18/11/16 • Allocation of
populations to grid cells and smoothing completed in R/RStudio • Version controlled with Git • R script available on GitHub https://github.com/nickbearman/popchange Technical – how did we create the grids and the online interface?

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Why open source?
• Ensures code is available to anyone who wishes to access it • Allows future collaboration • We have plans for expansion of the tools through another (potential) project

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Platforms  PostgreSQL
 Coljure (java based)  QGIS Backend (Python)  Visualisation calculations  File conversion  GeoTIFF → SHP & MapInfo TAB

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Practicals  Workbook
practicals for University of Liverpool  RStudio / Markdown  QGIS

Dr Nick Bearman | @nickbearmanuk | 18/11/16  Half day
workshop  Talk on PopChange and practical session  University of Liverpool in London  33 Finsbury Square, London EC2A 1AG  Date  Mon 6th Feb  Email list / Twitter for details! as workshop in London

Dr Nick Bearman | @nickbearmanuk | 18/11/16  Please give
us your feedback  Feedback form / in person  Help us help you to use the data! opChange as a Developing project http://popchange.liverpool.ac.uk

PopChange - Creating access to UK wide census data

PopChange - Creating access to UK wide census data

More Decks by nickbearman

Other Decks in Technology

Featured

Transcript