Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Geo databases: an intro to PostGIS

Geo databases: an intro to PostGIS

An introduction to PostGIS: the geospatial extension to PostgreSQL.

Kashif Rasul

April 28, 2012
Tweet

Other Decks in Programming

Transcript

  1. Geo databases: an intro to PostGIS Dr. Kashif Rasul @krasul

    SpacialDB This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States License.
  2. Installation: OS X • Install PostrgeSQL server ☺ brew install

    postgresql • Install Geos, Proj, Gdal, and Numpy ☺ pip install numpy ☺ brew install geos proj gdal • Install PostGIS extension ☺ brew install postgis
  3. Create a DB • Init. & start PostgreSQL ☺ initdb

    /usr/local/var/postgres ☺ pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start • create a DB ☺ createdb nyc • and connect to it ☺ psql -h localhost nyc psql (9.1.3) Type "help" for help. nyc=#
  4. Load spatial data • Get the data from bit.ly/postgis-data •

    Import it via shp2pgsql: ☺ shp2pgsql -s 26918 -g the_geom -I -D nyc_census_blocks.shp public.nyc_census_blocks | psql -h localhost nyc ... • Import the rest: streets, neigh. and subway station !les
  5. What are the names of all the neigh. in NYC?

    nyc=# SELECT name FROM nyc_neighborhoods; name ----------------------------- Bensonhurst East Village West Village Throggs Neck Wakefield-Williamsbridge ...
  6. What are the names of all the neigh. in Brooklyn?

    nyc=# SELECT name FROM nyc_neighborhoods WHERE boroname = 'Brooklyn'; name -------------------------- Bensonhurst Bay Ridge Boerum Hill ...
  7. What is the average number of people living in each

    housing unit in NYC? nyc=# SELECT Sum(popn_total)/Sum(hous_total) AS popn_per_house FROM nyc_census_blocks; popn_per_house -------------------- 2.6503540522400804 (1 row)
  8. What is the average number of letters in the names

    of all the neigh. in NYC, reported by borough? nyc=# SELECT boroname, avg(char_length(name)), stddev(char_length(name)) FROM nyc_neighborhoods GROUP BY boroname; boroname | avg | stddev ---------------+---------------------+-------------------- Brooklyn | 11.7391304347826087 | 3.9105613559407395 Manhattan | 11.8214285714285714 | 4.3123729948325257 The Bronx | 12.0416666666666667 | 3.6651017740975152 Queens | 11.6666666666666667 | 5.0057438272815975 Staten Island | 12.2916666666666667 | 5.2043390480959474 (5 rows)
  9. For each borough, what % of the pop. is white?

    nyc=# SELECT boroname, 100 * Sum(popn_white)/Sum(popn_total) AS white_pct FROM nyc_census_blocks GROUP BY boroname; boroname | white_pct ---------------+--------------------- Brooklyn | 41.2005552206888663 The Bronx | 29.8655310846808990 Manhattan | 54.3594013771837665 Queens | 44.0806610271290794 Staten Island | 77.5968611401579346 (5 rows)
  10. Geometries! nyc=# SELECT * FROM geometry_columns; f_table_catalog | f_table_schema |

    f_table_name | f_geometry_column | coord_dimension | srid | type -----------------+----------------+---------------------+-------------------+-----------------+-------+----------------- nyc | public | nyc_census_blocks | the_geom | 2 | 26918 | MULTIPOLYGON nyc | public | nyc_streets | the_geom | 2 | 26918 | MULTILINESTRING nyc | public | nyc_neighborhoods | the_geom | 2 | 26918 | MULTIPOLYGON nyc | public | nyc_subway_stations | the_geom | 2 | 26918 | POINT
  11. Points: 2D, 3D or 4D nyc=# SELECT name, ST_AsText(the_geom) FROM

    nyc_subway_stations LIMIT 1; name | st_astext --------------+------------------------------------------ Cortlandt St | POINT(583521.854408956 4507077.86259909) (1 row)
  12. Other types • Linestrings • Polygons • Collections: • MultiPoints,

    MultiLines, MultiPolygons • GeometryCollection
  13. What is the area of the ‘West Village’ neigh.? nyc=#

    SELECT ST_Area(the_geom) FROM nyc_neighborhoods WHERE name = 'West Village'; st_area ----------------- 1044614.5296486 (1 row)
  14. What is the total length of streets (in km) in

    NYC? nyc=# SELECT Sum(ST_Length(the_geom)) / 1000 FROM nyc_streets; ?column? --------------- 10418.9047172 (1 row)
  15. What is the length of streets in NYC, summarized by

    type? nyc=# SELECT type, Sum(ST_Length(the_geom)) AS length FROM nyc_streets GROUP BY type ORDER BY length DESC; type | length -------------------------------------------------- residential | 8629870.33786606 motorway | 403622.478126363 tertiary | 360394.879051303 motorway_link | 294261.419479668 ...
  16. Spatial relationships • Ability to compare relationships between geometries: •

    equality • intersects, crosses, disjoint, etc. • touches • within, contains, etc. • distance, distance within, etc.
  17. nyc=# SELECT name, the_geom, ST_AsText(the_geom) FROM nyc_subway_stations WHERE name =

    'Broad St'; name | the_geom | st_astext ----------+----------------------------------------------------+------------------------------------------ Broad St | 0101000020266900000EEBD4CF27CF2141BC17D69516315141 | POINT(583571.905921312 4506714.34119218) (1 row)
  18. Approximately how many people live on (within 50m of) ‘Broad

    St’? nyc=# SELECT Sum(popn_total) FROM nyc_census_blocks WHERE ST_DWithin( the_geom, '0101000020266900000EEBD4CF27CF2141BC17D69516315141', 50); sum ----- 679 (1 row)
  19. Spatial joins nyc=# SELECT subways.name AS subway_name, neighborhoods.name AS neighborhood_name,

    neighborhoods.boroname AS borough FROM nyc_neighborhoods AS neighborhoods JOIN nyc_subway_stations AS subways ON ST_Contains(neighborhoods.the_geom, subways.the_geom) WHERE subways.name = 'Broad St'; subway_name | neighborhood_name | borough -------------+--------------------+----------- Broad St | Financial District | Manhattan (1 row)
  20. What is the pop. & racial make- up of the

    neigh. of Manhattan? nyc=# SELECT neighborhoods.name AS neighborhood_name, Sum(census.popn_total) AS population, Round(100.0 * Sum(census.popn_white) / Sum(census.popn_total),1) AS white_pct, Round(100.0 * Sum(census.popn_black) / Sum(census.popn_total),1) AS black_pct FROM nyc_neighborhoods AS neighborhoods JOIN nyc_census_blocks AS census ON ST_Intersects(neighborhoods.the_geom, census.the_geom) WHERE neighborhoods.boroname = 'Manhattan' GROUP BY neighborhoods.name ORDER BY black_pct DESC;
  21. neighborhood_name | population | white_pct | black_pct ---------------------+------------+-----------+----------- Harlem |

    125501 | 5.7 | 80.5 East Harlem | 62279 | 20.2 | 46.2 Hamilton Heights | 71133 | 14.6 | 41.1 Yorkville | 57800 | 31.2 | 33.3 Morningside Heights | 41499 | 50.2 | 24.8 Washington Heights | 187198 | 26.9 | 16.3 Inwood | 50922 | 29.3 | 14.9 Upper West Side | 212499 | 73.3 | 10.4 Central Park | 49284 | 77.8 | 10.4 Clinton | 26347 | 64.6 | 10.3 ...
  22. After 9/11, the ‘Battery Park’ neigh. was o" limits for

    several days. How many people had to be evacuated? nyc=# SELECT Sum(popn_total) FROM nyc_neighborhoods AS n JOIN nyc_census_blocks AS c ON ST_Intersects(n.the_geom, c.the_geom) WHERE n.name = 'Battery Park'; sum ------ 9928 (1 row)
  23. Spacial indexing • Spacial indexing is what makes it possible

    to use a spatial DB for large data • Without it one would need to sequentially scan all of the data • This is especially true when joining two large tables, since without indexing it would be extremely slow
  24. && operator nyc=# SELECT Sum(popn_total) FROM nyc_neighborhoods neighborhoods JOIN nyc_census_blocks

    blocks ON neighborhoods.the_geom && blocks.the_geom WHERE neighborhoods.name = 'West Village'; sum ------- 50325 (1 row)
  25. ...and with !ltering nyc=# SELECT Sum(popn_total) FROM nyc_neighborhoods neighborhoods JOIN

    nyc_census_blocks blocks ON ST_Intersects(neighborhoods.the_geom, blocks.the_geom) WHERE neighborhoods.name = 'West Village'; sum ------- 27141 (1 row)
  26. Projections • Many types of projections to map the geom.

    of the Earth onto #at paper or screen • SRID encapsulates this information • We will need to know a data’s SRID when importing • We will also need to re-project and transform form one system to another
  27. Comparing nyc=# SELECT ST_Equals( ST_GeomFromText('POINT(0 0)', 4326), ST_GeomFromText('POINT(0 0)', 26918)

    ); ERROR: Operation on mixed SRID geometries CONTEXT: SQL function "st_equals" statement 1
  28. Transforming nyc=# SELECT ST_AsText(ST_Transform(the_geom,4326)) FROM nyc_subway_stations WHERE name = 'Broad

    St'; st_astext ------------------------------------------- POINT(-74.0106714688735 40.7071048155841) (1 row)
  29. Geography • Used when data is represented in Spherical coordinates

    • Useful for calculating distances over the sphere • But only a small number of native functions for geography
  30. Casting nyc=# CREATE TABLE nyc_subway_stations_geog AS SELECT Geography(ST_Transform(the_geom,4326)) AS geog,

    name, routes FROM nyc_subway_stations; nyc=# CREATE INDEX nyc_subway_stations_geog_gix ON nyc_subway_stations_geog USING GIST(geog);
  31. Geometry construction • Functions that take geometries as input and

    return other geometries as output: • Centroids, Point on surface • Bu"er • Intersection, Union, etc.
  32. Further topics • Validity: important for polygons which de!ne area

    or have holes etc. • Tuning PostgreSQL for spatial queries • Raster data: geocoded images etc. • PostGIS topology: faces, edges & nodes
  33. SpacialDB • One click provisioning of a PostGIS enabled DB

    • Running on amazon, rackspace etc. • Pay for the total amount of storage with us • Comes with an API to query from mobile or web apps
  34. References M A N N I N G Regina O.

    Obe Leo S. Hsu FOREWORD BY PAUL RAMSEY IN ACTION