GIS technology stack • Esri § Editing and data processing in ArcGIS § Best tools for large-scale data production • NHGIS § Currently disseminates only shapefiles § Moving to PostGIS back-end to facilitate additional data formats • TerraPop § Currently disseminates only shapefiles § Uses PostGIS back-end so easy to extend to additional data formats
NHGIS • Started in 2001 with a major grant from the National Science Foundation • Received two additional NSF grants and two NICHD (NIH) grants • First MPC project to create and disseminate GIS data • NHGIS also disseminates aggregate census data to link with GIS data
Initial GIS data • Census tracts § 1910-2000 § Constructed from TIGER/Line 2000 data and scanned census tract maps • Counties § 1790-2000 § Constructed from TIGER/Line 2000 data, scanned census maps, Thorndale & Dollarhide’s Map Guide to the US Federal Censuses 1790-1920, and other sources
Current work • Historic place and county subdivision points § County subdivisions back to 1930 § Places back to 1790 § Using TIGER, GNIS, and scanned census maps § Create new summary data from 100% census microdata
Current work • Historic place and county subdivision points § County subdivisions back to 1930 § Places back to 1790 • Conflation § Aligning historic census tract and county boundaries with 2010 TIGER data
IPUMS-USA Integrated – consistent codes, labels and docs Public – anonymized, downloadable Microdata – individual-level Series – pooled data over time and place
Public Use Microdata Areas (PUMAs) • Smallest geographic unit identified in microdata • Minimum population = 100,000 • Delineated by states (not Census Bureau) after each decennial census
Limitations • Delineation rules change = inconsistent boundaries over time • We created consistent PUMAs covering the 1980-2000 time period (through visual inspection) and the 2000-2010 time period (through an automated algorithm)
Location-Based Integration Summarized
environmental
and
popula1on
Microdata Area-level data Rasters characteris1cs
for
administra1ve
districts
County ID G01001 G01003 G01005 G01007 County ID Mean Ann. Precip. Median HH Income G01001 768 50,500 G01003 589 48,500 G01005 867 51,000 G01007 701 50,750
Boundaries are Key • Linkages across data formats rely on administrative unit boundaries § Containers for summarizing raster data to area- level data § Containers for distributing area-level data to raster cells § Codes link area-level and summarized raster data to microdata • Sets of units and codes must match census data
Terra Populus – GIS data • Create an ‘authoritative’ (as possible) set of first and second administrative level boundaries • From most recent census back to ~1960s • Disseminate freely
GIS data – non-IPUMS countries Afghanistan
Bosnia
and
Herzegovina
Denmark
Georgia
Laos
Mauri1us
North
Korea
Saudi
Arabia
Tajikistan
Albania
Botswana
Djibou/
Guatemala
Latvia
Moldova
Norway
Serbia
Timor
Leste
Algeria
Bulgaria
Dominican
Republic
Guinea
Bissau
Lebanon
Montenegro
Oman
Singapore
Togo
Angola
Burundi
Equatorial
Guinea
Guyana
Lesotho
Mozambique
Papua
New
Guinea
Slovakia
Trinidad
and
Tobago
Azerbaijan
Central
African
Republic
Eritrea
Honduras
Liberia
Myanmar
Paraguay
South
Korea
Tunisia
Bahrain
Chad
Estonia
Hong
Kong
Libya
Namibia
Poland
Sri
Lanka
Turkmenistan
Bangladesh
Comoros
Ethiopia
Ivory
Coast
Lithuania
Nepal
Qatar
Swaziland
Ukraine
Belgium
Croa1a
Finland
Japan
Macedonia
New
Zealand
Republic
of
Congo
Sweden
United
Arab
Emirates
Benin
Cyprus
Gabon
Kazakhstan
Madagascar
Niger
Reunion
Syria
Yemen
Bhutan
Czech
Republic
Gambia
Kuwait
Mauritania
Nigeria
Russia
Taiwan
Zambia
Zimbabwe