Slide 1

Slide 1 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Improving Access to UK-wide Census Data Nick Bearman Clear Mapping Co University of Liverpool @nickbearmanuk

Slide 2

Slide 2 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Why is it important to make Census data more accessible? (& how) http://www.telegraph.co.uk/news/uknews/8371197/Missing-questions-on-2011-Census-baffle-public.html

Slide 3

Slide 3 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 The Census provides a whole range of very useful data Hours of unpaid care http://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/articles/2011censusanalysisunpaidcareinenglandandwale s2011andcomparisonwith2001/2013-02-15

Slide 4

Slide 4 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 http://www.neighbourhood.statistics.gov.uk/HTMLDocs/dvc128/wrapper.html The Census provides a whole range of very useful data Hours of unpaid care http://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/articles/2011censusanalysisunpaidcareinenglandandwale s2011andcomparisonwith2001/2013-02-15

Slide 5

Slide 5 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 The Census provides a whole range of very useful data Google Image Search – Census outputs UK: https://www.google.com/search? safe=off&site=&tbm=isch&source=hp&biw=1265&bih=918&q=census+questionnaire+uk&oq=census+questionnaire+uk

Slide 6

Slide 6 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data is hard to access https://photosleuth.wordpress.com/category/derbyshire/page/3/

Slide 7

Slide 7 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data is hard to access Current Data – Casweb: UK, 1971 - 2001

Slide 8

Slide 8 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data is hard to access Current Data – Casweb: UK, 1971 - 2001

Slide 9

Slide 9 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data is hard to access Current Data – Casweb: UK, 1971 - 2001

Slide 10

Slide 10 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Infuse – 2001 and 2011, England and Wales only

Slide 11

Slide 11 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Infuse – 2001 and 2011, England and Wales only

Slide 12

Slide 12 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Infuse – 2001 and 2011, England and Wales only

Slide 13

Slide 13 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Scotland 2001 & 2011

Slide 14

Slide 14 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Scotland 2001 & 2011

Slide 15

Slide 15 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Scotland 2001 & 2011

Slide 16

Slide 16 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data is hard to access

Slide 17

Slide 17 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Archive Data ED boundaries for 1971 not available (only centroids) ESRC-funded project to extract 1971, 1981 and 1991 data from a COBOL database; all data now available for 1971-2011 Pre 1971 data not available digitally systematically https://www.flickr.com/photos/woolamaloo_gazette/6238597508/

Slide 18

Slide 18 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 We wanted to make Census data easier to access Particularly to look at how areas change over time If the data is easier to access, more people will use it What’s the problem?

Slide 19

Slide 19 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Census data is made available in Output Areas ~ 100 households Comparing these small areas over time is difficult, as they change We wanted to make Census data easier to access

Slide 20

Slide 20 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Census data is made available in Output Areas ~ 100 households Comparing these small areas over time is difficult, as they change We wanted to make Census data easier to access NW Swindon Jan 2004 NW Swindon Jan 2011 http://www.ordnancesurvey.co.uk/blog/2011/04/3974/

Slide 21

Slide 21 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Blue = 1991 Enumeration districts; Red = 2011 Output Areas

Slide 22

Slide 22 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 So these comparisons can be tricky to do Particularly if you want to go from 1971/81/91 (enumeration districts) to 2001/11 (output areas) Blue = 1991 Enumeration districts; Red = 2011 Output Areas

Slide 23

Slide 23 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Who is interested in these comparisons?

Slide 24

Slide 24 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Who is interested in these comparisons? https://www.ons.gov.uk/census/2011census/2011censusbenefits/howothersusecensusdata

Slide 25

Slide 25 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Who is interested in these comparisons? Sector % N = 76 Academic study / research 64% 49 Schools 4% 3 Central government 4% 3 Private sector 7% 5 Third sector 3% 2 Personal use 5% 4 Local government 13% 10 users, 20161115, n = 76

Slide 26

Slide 26 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we achieve this, and make Census data more useful?

Slide 27

Slide 27 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we achieve this, and make Census data more useful? • Converted most variables for all years • To a 1km grid across Great Britain

Slide 28

Slide 28 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we achieve this, and make Census data more useful? We also: • Created an online resource to do comparisons

Slide 29

Slide 29 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16

Slide 30

Slide 30 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16

Slide 31

Slide 31 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16

Slide 32

Slide 32 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we achieve this, and make Census data more useful? We also: • Created an online resource to do comparisons • Limited the comparisons & variables

Slide 33

Slide 33 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16

Slide 34

Slide 34 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 2001 2011 Fairly Good Good Not Good Good Fair Bad OK Use Caution Bad

Slide 35

Slide 35 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we achieve this, and make Census data more useful? We also: • Created an online resource to do comparisons • Limited the comparisons & variables • Simple interface and output • But option to get data if you wish

Slide 36

Slide 36 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16

Slide 37

Slide 37 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we achieve this, and make Census data more useful? With Census data that is easier to use, we hope: • More people can make use of the data • More people can look at change over time for small areas • We can highlight how useful GIS can be • We can how how important ease of use data is

Slide 38

Slide 38 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Moving on from ‘Why?’ to ‘How?’ Academic Technical

Slide 39

Slide 39 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – How did we get from EDs/OAs to 1km grid? Postcode centroids, Output Areas and 1km grid cells

Slide 40

Slide 40 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – Using Postcode Density 1. Generate a postcode intensity grid using kernel estimation – 1km cells 2. Overlay 1 with source zones (e.g., Output Areas (OAs) giving OAG) 3. Compute populations (OAG_Estimate) for each OAG zone with: • WtArea = Wt  OAG_Area; • WtAreaSum = WtArea summed by OA; • OAG_Estimate = WtArea / WtAreaSum  OAPop 4. Aggregate OAG_Estimate values by grid cell Population is then allocated to 1km grids, based on postcode densities (i.e. more postcodes -> more people)

Slide 41

Slide 41 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 This then allows us to compare variables over time using the 1km gird - For a wide range of variables - For 2011, 2001, 1991, 1981 and 1971 http://gis.stackexchange.com/questions/20127/creating-a-raster-of-the- residuals-of-a-regression-between-two-rasters Academic – Using Postcode Density

Slide 42

Slide 42 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – Did it work? • There was a gridded output for 1971 based on counts, but this is now missing • 1981 onwards there is no gridded output for GB, which is why we need this recourse • There is a gridded resource for Northern Ireland for 1971 – 2011, so we can compare

Slide 43

Slide 43 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – Did it work? Generate 1km grids from Small Area (SA) data using postcode centroids to determine variations in population density within SAs. Use NI Census Grid Square resource (available since 1971) to assess accuracy of estimates for grid cells. NI total population: 1,810,863 Small Areas: n 4537 Minimum 98 Maximum 3075 Mean 399.13

Slide 44

Slide 44 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Total persons / HA by Small Area

Slide 45

Slide 45 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Total persons by 1km grid cells

Slide 46

Slide 46 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Estimated total persons by 1km grid cells (>= 1 person)

Slide 47

Slide 47 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Estimated total persons by 1km grid cells (>= 25 persons)

Slide 48

Slide 48 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Estimated – observed total persons by 1km grid cells

Slide 49

Slide 49 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Estimated total persons by 1km grid cells Total persons / HA by Output Area

Slide 50

Slide 50 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Estimated total persons by 1km grid cells (>= 25 persons) Estimated total persons by 1km grid cells (>= 1 person)

Slide 51

Slide 51 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Refinement to surface modelling procedure • Once the grids are generated, a further stage is to smooth cell values • If this is not done some cells within larger source zones will have identical values and this is not desirable • The amount of smoothing will be different for some variables than for others; spatially more ‘noisy’ variables such as LLTI will change more (proportionately) after smoothing than those which are more continuous (e.g., ethnicity) • All grids are smoothed using a 3 by 3 cell smoothing filter so that adjacent cells completely within source zones have different values (but the sum of the grouped cells remains the same so the total population is unchanged)

Slide 52

Slide 52 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Technical – how did we create the grids and the online interface?

Slide 53

Slide 53 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 • Allocation of populations to grid cells and smoothing completed in R/RStudio • Version controlled with Git • R script available on GitHub https://github.com/nickbearman/popchange Technical – how did we create the grids and the online interface?

Slide 54

Slide 54 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Why open source? • Ensures code is available to anyone who wishes to access it • Allows future collaboration • We have plans for expansion of the tools through another (potential) project

Slide 55

Slide 55 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16

Slide 56

Slide 56 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Platforms  PostgreSQL  Coljure (java based)  QGIS Backend (Python)  Visualisation calculations  File conversion  GeoTIFF → SHP & MapInfo TAB

Slide 57

Slide 57 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16 Practicals  Workbook practicals for University of Liverpool  RStudio / Markdown  QGIS

Slide 58

Slide 58 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16  Half day workshop  Talk on PopChange and practical session  University of Liverpool in London  33 Finsbury Square, London EC2A 1AG  Date  Mon 6th Feb  Email list / Twitter for details! as workshop in London

Slide 59

Slide 59 text

Dr Nick Bearman | @nickbearmanuk | 18/11/16  Please give us your feedback  Feedback form / in person  Help us help you to use the data! opChange as a Developing project http://popchange.liverpool.ac.uk