Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PopChange - Creating access to UK wide census data

nickbearman
November 18, 2016

PopChange - Creating access to UK wide census data

Presentation given on PopChange and Open Data at ODI Lunchtime Seminars

nickbearman

November 18, 2016
Tweet

More Decks by nickbearman

Other Decks in Technology

Transcript

  1. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Improving Access to

    UK-wide Census Data Nick Bearman Clear Mapping Co University of Liverpool @nickbearmanuk
  2. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Why is it

    important to make Census data more accessible? (& how) http://www.telegraph.co.uk/news/uknews/8371197/Missing-questions-on-2011-Census-baffle-public.html
  3. Dr Nick Bearman | @nickbearmanuk | 18/11/16 The Census provides

    a whole range of very useful data Hours of unpaid care http://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/articles/2011censusanalysisunpaidcareinenglandandwale s2011andcomparisonwith2001/2013-02-15
  4. Dr Nick Bearman | @nickbearmanuk | 18/11/16 http://www.neighbourhood.statistics.gov.uk/HTMLDocs/dvc128/wrapper.html The Census

    provides a whole range of very useful data Hours of unpaid care http://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/articles/2011censusanalysisunpaidcareinenglandandwale s2011andcomparisonwith2001/2013-02-15
  5. Dr Nick Bearman | @nickbearmanuk | 18/11/16 The Census provides

    a whole range of very useful data Google Image Search – Census outputs UK: https://www.google.com/search? safe=off&site=&tbm=isch&source=hp&biw=1265&bih=918&q=census+questionnaire+uk&oq=census+questionnaire+uk
  6. Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data

    is hard to access https://photosleuth.wordpress.com/category/derbyshire/page/3/
  7. Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data

    is hard to access Current Data – Casweb: UK, 1971 - 2001
  8. Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data

    is hard to access Current Data – Casweb: UK, 1971 - 2001
  9. Dr Nick Bearman | @nickbearmanuk | 18/11/16 BUT Census data

    is hard to access Current Data – Casweb: UK, 1971 - 2001
  10. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Archive Data ED

    boundaries for 1971 not available (only centroids) ESRC-funded project to extract 1971, 1981 and 1991 data from a COBOL database; all data now available for 1971-2011 Pre 1971 data not available digitally systematically https://www.flickr.com/photos/woolamaloo_gazette/6238597508/
  11. Dr Nick Bearman | @nickbearmanuk | 18/11/16 We wanted to

    make Census data easier to access Particularly to look at how areas change over time If the data is easier to access, more people will use it What’s the problem?
  12. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Census data is

    made available in Output Areas ~ 100 households Comparing these small areas over time is difficult, as they change We wanted to make Census data easier to access
  13. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Census data is

    made available in Output Areas ~ 100 households Comparing these small areas over time is difficult, as they change We wanted to make Census data easier to access NW Swindon Jan 2004 NW Swindon Jan 2011 http://www.ordnancesurvey.co.uk/blog/2011/04/3974/
  14. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Blue = 1991

    Enumeration districts; Red = 2011 Output Areas
  15. Dr Nick Bearman | @nickbearmanuk | 18/11/16 So these comparisons

    can be tricky to do Particularly if you want to go from 1971/81/91 (enumeration districts) to 2001/11 (output areas) Blue = 1991 Enumeration districts; Red = 2011 Output Areas
  16. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Who is interested

    in these comparisons? https://www.ons.gov.uk/census/2011census/2011censusbenefits/howothersusecensusdata
  17. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Who is interested

    in these comparisons? Sector % N = 76 Academic study / research 64% 49 Schools 4% 3 Central government 4% 3 Private sector 7% 5 Third sector 3% 2 Personal use 5% 4 Local government 13% 10 users, 20161115, n = 76
  18. Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we

    achieve this, and make Census data more useful?
  19. Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we

    achieve this, and make Census data more useful? • Converted most variables for all years • To a 1km grid across Great Britain
  20. Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we

    achieve this, and make Census data more useful? We also: • Created an online resource to do comparisons
  21. Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we

    achieve this, and make Census data more useful? We also: • Created an online resource to do comparisons • Limited the comparisons & variables
  22. Dr Nick Bearman | @nickbearmanuk | 18/11/16 2001 2011 Fairly

    Good Good Not Good Good Fair Bad OK Use Caution Bad
  23. Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we

    achieve this, and make Census data more useful? We also: • Created an online resource to do comparisons • Limited the comparisons & variables • Simple interface and output • But option to get data if you wish
  24. Dr Nick Bearman | @nickbearmanuk | 18/11/16 How did we

    achieve this, and make Census data more useful? With Census data that is easier to use, we hope: • More people can make use of the data • More people can look at change over time for small areas • We can highlight how useful GIS can be • We can how how important ease of use data is
  25. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Moving on from

    ‘Why?’ to ‘How?’ Academic Technical
  26. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – How

    did we get from EDs/OAs to 1km grid? Postcode centroids, Output Areas and 1km grid cells
  27. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – Using

    Postcode Density 1. Generate a postcode intensity grid using kernel estimation – 1km cells 2. Overlay 1 with source zones (e.g., Output Areas (OAs) giving OAG) 3. Compute populations (OAG_Estimate) for each OAG zone with: • WtArea = Wt  OAG_Area; • WtAreaSum = WtArea summed by OA; • OAG_Estimate = WtArea / WtAreaSum  OAPop 4. Aggregate OAG_Estimate values by grid cell Population is then allocated to 1km grids, based on postcode densities (i.e. more postcodes -> more people)
  28. Dr Nick Bearman | @nickbearmanuk | 18/11/16 This then allows

    us to compare variables over time using the 1km gird - For a wide range of variables - For 2011, 2001, 1991, 1981 and 1971 http://gis.stackexchange.com/questions/20127/creating-a-raster-of-the- residuals-of-a-regression-between-two-rasters Academic – Using Postcode Density
  29. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – Did

    it work? • There was a gridded output for 1971 based on counts, but this is now missing • 1981 onwards there is no gridded output for GB, which is why we need this recourse • There is a gridded resource for Northern Ireland for 1971 – 2011, so we can compare
  30. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Academic – Did

    it work? Generate 1km grids from Small Area (SA) data using postcode centroids to determine variations in population density within SAs. Use NI Census Grid Square resource (available since 1971) to assess accuracy of estimates for grid cells. NI total population: 1,810,863 Small Areas: n 4537 Minimum 98 Maximum 3075 Mean 399.13
  31. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Estimated total persons

    by 1km grid cells Total persons / HA by Output Area
  32. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Estimated total persons

    by 1km grid cells (>= 25 persons) Estimated total persons by 1km grid cells (>= 1 person)
  33. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Refinement to surface

    modelling procedure • Once the grids are generated, a further stage is to smooth cell values • If this is not done some cells within larger source zones will have identical values and this is not desirable • The amount of smoothing will be different for some variables than for others; spatially more ‘noisy’ variables such as LLTI will change more (proportionately) after smoothing than those which are more continuous (e.g., ethnicity) • All grids are smoothed using a 3 by 3 cell smoothing filter so that adjacent cells completely within source zones have different values (but the sum of the grouped cells remains the same so the total population is unchanged)
  34. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Technical – how

    did we create the grids and the online interface?
  35. Dr Nick Bearman | @nickbearmanuk | 18/11/16 • Allocation of

    populations to grid cells and smoothing completed in R/RStudio • Version controlled with Git • R script available on GitHub https://github.com/nickbearman/popchange Technical – how did we create the grids and the online interface?
  36. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Why open source?

    • Ensures code is available to anyone who wishes to access it • Allows future collaboration • We have plans for expansion of the tools through another (potential) project
  37. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Platforms  PostgreSQL

     Coljure (java based)  QGIS Backend (Python)  Visualisation calculations  File conversion  GeoTIFF → SHP & MapInfo TAB
  38. Dr Nick Bearman | @nickbearmanuk | 18/11/16 Practicals  Workbook

    practicals for University of Liverpool  RStudio / Markdown  QGIS
  39. Dr Nick Bearman | @nickbearmanuk | 18/11/16  Half day

    workshop  Talk on PopChange and practical session  University of Liverpool in London  33 Finsbury Square, London EC2A 1AG  Date  Mon 6th Feb  Email list / Twitter for details! as workshop in London
  40. Dr Nick Bearman | @nickbearmanuk | 18/11/16  Please give

    us your feedback  Feedback form / in person  Help us help you to use the data! opChange as a Developing project http://popchange.liverpool.ac.uk