$30 off During Our Annual Pro Sale. View Details »

PopChange - Creating access to UK wide census data

nickbearman
November 18, 2016

PopChange - Creating access to UK wide census data

Presentation given on PopChange and Open Data at ODI Lunchtime Seminars

nickbearman

November 18, 2016
Tweet

More Decks by nickbearman

Other Decks in Technology

Transcript

  1. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Improving Access to UK-wide
    Census Data
    Nick Bearman
    Clear Mapping Co
    University of Liverpool
    @nickbearmanuk

    View Slide

  2. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Why is it important to make Census data
    more accessible? (& how)
    http://www.telegraph.co.uk/news/uknews/8371197/Missing-questions-on-2011-Census-baffle-public.html

    View Slide

  3. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    The Census provides a whole range of
    very useful data
    Hours of unpaid care
    http://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/articles/2011censusanalysisunpaidcareinenglandandwale
    s2011andcomparisonwith2001/2013-02-15

    View Slide

  4. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    http://www.neighbourhood.statistics.gov.uk/HTMLDocs/dvc128/wrapper.html
    The Census provides a whole range of
    very useful data
    Hours of unpaid care
    http://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/articles/2011censusanalysisunpaidcareinenglandandwale
    s2011andcomparisonwith2001/2013-02-15

    View Slide

  5. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    The Census provides a whole range of
    very useful data
    Google Image Search – Census outputs UK: https://www.google.com/search?
    safe=off&site=&tbm=isch&source=hp&biw=1265&bih=918&q=census+questionnaire+uk&oq=census+questionnaire+uk

    View Slide

  6. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    BUT Census data is hard to access
    https://photosleuth.wordpress.com/category/derbyshire/page/3/

    View Slide

  7. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    BUT Census data is hard to access
    Current Data – Casweb: UK, 1971 - 2001

    View Slide

  8. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    BUT Census data is hard to access
    Current Data – Casweb: UK, 1971 - 2001

    View Slide

  9. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    BUT Census data is hard to access
    Current Data – Casweb: UK, 1971 - 2001

    View Slide

  10. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Infuse – 2001 and 2011, England and Wales only

    View Slide

  11. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Infuse – 2001 and 2011, England and Wales only

    View Slide

  12. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Infuse – 2001 and 2011, England and Wales only

    View Slide

  13. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Scotland 2001 & 2011

    View Slide

  14. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Scotland 2001 & 2011

    View Slide

  15. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Scotland 2001 & 2011

    View Slide

  16. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    BUT Census data is hard to access

    View Slide

  17. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Archive Data
    ED boundaries for 1971 not available
    (only centroids)
    ESRC-funded project to extract 1971,
    1981 and 1991 data from a COBOL
    database; all data now available for
    1971-2011
    Pre 1971 data not available digitally
    systematically
    https://www.flickr.com/photos/woolamaloo_gazette/6238597508/

    View Slide

  18. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    We wanted to make
    Census data easier to access
    Particularly to look at how areas change over
    time
    If the data is easier to access, more people
    will use it
    What’s the problem?

    View Slide

  19. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Census data is made available in Output Areas
    ~ 100 households
    Comparing these small areas over time is difficult,
    as they change
    We wanted to make
    Census data easier to access

    View Slide

  20. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Census data is made available in Output Areas
    ~ 100 households
    Comparing these small areas over time is difficult,
    as they change
    We wanted to make
    Census data easier to access
    NW Swindon Jan 2004 NW Swindon Jan 2011
    http://www.ordnancesurvey.co.uk/blog/2011/04/3974/

    View Slide

  21. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Blue = 1991 Enumeration districts; Red = 2011 Output Areas

    View Slide

  22. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    So these comparisons can
    be tricky to do
    Particularly if you
    want to go from
    1971/81/91
    (enumeration districts)
    to 2001/11
    (output areas)
    Blue = 1991 Enumeration districts; Red = 2011 Output Areas

    View Slide

  23. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Who is interested in these comparisons?

    View Slide

  24. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Who is interested in these comparisons?
    https://www.ons.gov.uk/census/2011census/2011censusbenefits/howothersusecensusdata

    View Slide

  25. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Who is interested in these comparisons?
    Sector % N = 76
    Academic study /
    research
    64% 49
    Schools 4% 3
    Central government 4% 3
    Private sector 7% 5
    Third sector 3% 2
    Personal use 5% 4
    Local government 13% 10
    users, 20161115, n = 76

    View Slide

  26. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    How did we achieve this,
    and make Census data more useful?

    View Slide

  27. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    How did we achieve this,
    and make Census data more useful?
    • Converted most variables for all years
    • To a 1km grid across Great Britain

    View Slide

  28. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    How did we achieve this,
    and make Census data more useful?
    We also:
    • Created an online resource to do comparisons

    View Slide

  29. Dr Nick Bearman | @nickbearmanuk | 18/11/16

    View Slide

  30. Dr Nick Bearman | @nickbearmanuk | 18/11/16

    View Slide

  31. Dr Nick Bearman | @nickbearmanuk | 18/11/16

    View Slide

  32. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    How did we achieve this,
    and make Census data more useful?
    We also:
    • Created an online resource to do comparisons
    • Limited the comparisons & variables

    View Slide

  33. Dr Nick Bearman | @nickbearmanuk | 18/11/16

    View Slide

  34. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    2001 2011
    Fairly Good
    Good
    Not Good
    Good
    Fair
    Bad
    OK
    Use Caution
    Bad

    View Slide

  35. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    How did we achieve this,
    and make Census data more useful?
    We also:
    • Created an online resource to do comparisons
    • Limited the comparisons & variables
    • Simple interface and output
    • But option to get data if you wish

    View Slide

  36. Dr Nick Bearman | @nickbearmanuk | 18/11/16

    View Slide

  37. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    How did we achieve this,
    and make Census data more useful?
    With Census data that is easier to use, we hope:
    • More people can make use of the data
    • More people can look at change over time for
    small areas
    • We can highlight how useful GIS can be
    • We can how how important ease of use data is

    View Slide

  38. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Moving on from ‘Why?’ to ‘How?’
    Academic
    Technical

    View Slide

  39. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Academic – How did we get from EDs/OAs to 1km grid?
    Postcode centroids, Output Areas and 1km grid cells

    View Slide

  40. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Academic – Using Postcode Density
    1. Generate a postcode intensity grid using kernel estimation – 1km cells
    2. Overlay 1 with source zones (e.g., Output Areas (OAs) giving OAG)
    3. Compute populations (OAG_Estimate) for each OAG zone with:
    • WtArea = Wt  OAG_Area;
    • WtAreaSum = WtArea summed by OA;
    • OAG_Estimate = WtArea / WtAreaSum  OAPop
    4. Aggregate OAG_Estimate values by grid cell
    Population is then allocated to 1km grids, based on postcode densities
    (i.e. more postcodes -> more people)

    View Slide

  41. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    This then allows us to compare variables over time using
    the 1km gird
    - For a wide range of variables
    - For 2011, 2001, 1991, 1981 and 1971
    http://gis.stackexchange.com/questions/20127/creating-a-raster-of-the-
    residuals-of-a-regression-between-two-rasters
    Academic – Using Postcode Density

    View Slide

  42. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Academic – Did it work?
    • There was a gridded output for 1971 based on counts, but
    this is now missing
    • 1981 onwards there is no gridded output for GB, which is
    why we need this recourse
    • There is a gridded resource for Northern Ireland for 1971 –
    2011, so we can compare

    View Slide

  43. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Academic – Did it work?
    Generate 1km grids from Small Area (SA) data using postcode
    centroids to determine variations in population density within SAs.
    Use NI Census Grid Square resource (available since 1971) to
    assess accuracy of estimates for grid cells.
    NI total population: 1,810,863
    Small Areas:
    n 4537
    Minimum 98
    Maximum 3075
    Mean 399.13

    View Slide

  44. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Total persons / HA by Small Area

    View Slide

  45. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Total persons by 1km grid cells

    View Slide

  46. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Estimated total persons by 1km grid cells (>= 1 person)

    View Slide

  47. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Estimated total persons by 1km grid cells (>= 25 persons)

    View Slide

  48. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Estimated – observed total persons by 1km grid cells

    View Slide

  49. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Estimated total persons by 1km grid
    cells
    Total persons / HA by Output Area

    View Slide

  50. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Estimated total persons by 1km
    grid cells (>= 25 persons)
    Estimated total persons by 1km
    grid cells (>= 1 person)

    View Slide

  51. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Refinement to surface modelling procedure
    • Once the grids are generated, a further stage is to smooth cell values
    • If this is not done some cells within larger source zones will have identical values and this is
    not desirable
    • The amount of smoothing will be different for some variables than for others; spatially
    more ‘noisy’ variables such as LLTI will change more (proportionately) after smoothing than
    those which are more continuous (e.g., ethnicity)
    • All grids are smoothed using a 3 by 3
    cell smoothing filter so that adjacent cells
    completely within source zones have different
    values (but the sum of the grouped cells remains
    the same so the total population is unchanged)

    View Slide

  52. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Technical – how did we create the grids
    and the online interface?

    View Slide

  53. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    • Allocation of populations to grid cells and smoothing
    completed in R/RStudio
    • Version controlled with Git
    • R script available on GitHub
    https://github.com/nickbearman/popchange
    Technical – how did we create the grids
    and the online interface?

    View Slide

  54. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Why open source?
    • Ensures code is available to anyone who wishes to
    access it
    • Allows future collaboration
    • We have plans for expansion of the tools through
    another (potential) project

    View Slide

  55. Dr Nick Bearman | @nickbearmanuk | 18/11/16

    View Slide

  56. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Platforms

    PostgreSQL

    Coljure (java based)

    QGIS Backend (Python)

    Visualisation calculations

    File conversion

    GeoTIFF → SHP & MapInfo TAB

    View Slide

  57. Dr Nick Bearman | @nickbearmanuk | 18/11/16
    Practicals

    Workbook practicals for University of Liverpool

    RStudio / Markdown

    QGIS

    View Slide

  58. Dr Nick Bearman | @nickbearmanuk | 18/11/16

    Half day workshop

    Talk on PopChange and practical session

    University of Liverpool in London

    33 Finsbury Square, London EC2A 1AG

    Date

    Mon 6th Feb

    Email list / Twitter for details!
    as workshop in London

    View Slide

  59. Dr Nick Bearman | @nickbearmanuk | 18/11/16

    Please give us your feedback

    Feedback form / in person

    Help us help you to use the data!
    opChange as a Developing project
    http://popchange.liverpool.ac.uk

    View Slide