Geospatial Analysis Made Easy with meza

Slide 1

Slide 1 text

GEOSPATIAL ANALYSIS MADE EASY WITH MEZA GeoPython — Basel, Switzerland — May 10, 2017 by Reuben Cummings @reubano

Slide 2

Slide 2 text

WHO AM I? Managing Director, Nerevu Development Founder of Arusha Coders Author of several popular Python packages

Slide 3

Slide 3 text

ME ZA ( GI TH UB .C OM / R E UB A NO /M E ZA )

Slide 4

Slide 4 text

readers converters MEZA OVERVIEW records input output

Slide 5

Slide 5 text

readers converters MEZA OVERVIEW records input output

Slide 6

Slide 6 text

readers converters MEZA OVERVIEW records input output

Slide 7

Slide 7 text

MEZA INPUT/OUTPUT Input Formats Output Formats Array CSV GeoJSON JSON GeoJSON MDB CSV/TSV SQLITE DBF XLS(X) JSON YAML HTML

Slide 8

Slide 8 text

MT. K I L IMA NJ AR O (M OS HI , TAN ZA N I A ) Photo Credit: Reuben Cummings

Slide 9

Slide 9 text

{ "type": "FeatureCollection", "features": [ { "type": "Feature", "properties": { "peak": "uhuru", "id": 10 }, "geometry": { UHURU_PEAK.GEOJSON

Slide 10

Slide 10 text

"type": "Point", "coordinates": [ 37.350666, -3.066465 ] } } ] } UHURU_PEAK.GEOJSON

Slide 11

Slide 11 text

{ "type": "FeatureCollection", "features": [ { "type": "Feature", "properties": { "peak": "kibo", "id": 11 }, "geometry": { KIBO_PEAK.GEOJSON

Slide 12

Slide 12 text

"type": "Point", "coordinates": [ 37.353333, -3.075833 ] } } ] } KIBO_PEAK.GEOJSON

Slide 13

Slide 13 text

MEZA DEMO

Slide 14

Slide 14 text

>>> from meza import io >>> >>> records = io.read('kibo_peak.geojson') >>> next(records) {'id': 11, 'lat': Decimal('-3.075833'), 'lon': Decimal('37.353333'), 'peak': 'kibo', 'type': 'Point'} MEZA DEMO (READERS)

Slide 15

Slide 15 text

CHALLENGE #1 MERGING

Slide 16

Slide 16 text

CHALLENGE #1 MERGING

Slide 17

Slide 17 text

MEZA DEMO (MERGING) >>> from meza import convert as cv >>> >>> paths = ( ... 'uhuru_peak.geojson', ... 'kibo_peak.geojson') >>> >>> records = io.join(*paths) >>> geojson = cv.records2geojson(records) >>> io.write('meza_peaks.geojson', geojson)

Slide 18

Slide 18 text

{ "type": "FeatureCollection", "bbox": [ 37.350666, -3.075833, 37.353333, -3.066465 ], "features": [ { MEZA_PEAKS.GEOJSON

Slide 19

Slide 19 text

"type": "Feature", "id": 10, "geometry": { "type": "Point", "coordinates": [ 37.350666, -3.066465 ] }, "properties": { MEZA_PEAKS.GEOJSON

Slide 20

Slide 20 text

"id": 10, "peak": "uhuru" } }, { "type": "Feature", "id": 11, "geometry": { "type": "Point", MEZA_PEAKS.GEOJSON

Slide 21

Slide 21 text

"coordinates": [ 37.353333, -3.075833 ] }, "properties": { "id": 11, "peak": "kibo" } } MEZA_PEAKS.GEOJSON

Slide 22

Slide 22 text

], "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } } } MEZA_PEAKS.GEOJSON

Slide 23

Slide 23 text

>>> records = io.read('meza_peaks.geojson') >>> csv = cv.records2csv(records) >>> io.write('meza_peaks.csv', csv) MEZA DEMO (MERGING) $ pip install --user csvkit $ csvlook meza_peaks.csv | id | type | lat | lon | peak | | -- | ----- | ------- | ------- | ----- | | 10 | Point | -3.066… | 37.350… | uhuru | | 11 | Point | -3.075… | 37.353… | kibo |

Slide 24

Slide 24 text

CHALLENGE #2 SPLIT BY ID

Slide 25

Slide 25 text

CHALLENGE #2 SPLIT BY ID

Slide 26

Slide 26 text

>>> for _id, _records in groups: ... f = cv.records2geojson(_records) ... io.write(name.format(_id), f) >>> from meza import process as pr >>> >>> records = io.read('meza_peaks.geojson') >>> groups = pr.group(records, 'id') >>> name = 'peak_{}.geojson' >>> MEZA DEMO (SPLIT BY ID)

Slide 27

Slide 27 text

$ ls peak_* peak_10.geojson peak_11.geojson MEZA DEMO (SPLIT BY ID)

Slide 28

Slide 28 text

CHALLENGE #3 EXTRACT BY ID

Slide 29

Slide 29 text

CHALLENGE #3 EXTRACT BY ID

Slide 30

Slide 30 text

>>> records = io.read('peaks.geojson') >>> groups = pr.group(records, 'id') >>> group = next( ... g for g in groups if g[0] == 11) >>> MEZA DEMO (EXTRACT BY ID)

Slide 31

Slide 31 text

>>> geojson = cv.records2csv(group[1]) >>> io.write('id_11_peaks.csv', geojson) >>> records = io.read('peaks.geojson') >>> groups = pr.group(records, 'id') >>> group = next( ... g for g in groups if g[0] == 11) >>> MEZA DEMO (EXTRACT BY ID)

Slide 32

Slide 32 text

$ csvlook id_11_peaks.csv | id | type | lat | lon | peak | | -- | ----- | ------- | ------- | ---- | | 11 | Point | -3.076… | 37.353… | kibo | MEZA DEMO (EXTRACT BY ID)

Slide 33

Slide 33 text

BUT WAIT, THERE'S MORE! ME ZA D E MO

Slide 34

Slide 34 text

CHALLENGE #4 EXTRACT BY ID V2

Slide 35

Slide 35 text

CHALLENGE #4 EXTRACT BY ID V2

Slide 36

Slide 36 text

MEZA DEMO (EXTRACT BY ID V2) >>> from urllib.request import urlopen >>> >>> BASE = 'https://raw.githubusercontent.com' >>> REPO = 'drei01/geojson-world-cities' >>> path = '{}/{}/master/cities.geojson' >>> url = path.format(BASE, REPO) >>> f = urlopen(url) >>> records = io.read_geojson(f)

Slide 37

Slide 37 text

MEZA DEMO (EXTRACT BY ID V2) >>> next(records) {'NAME': 'TORSHAVN', 'id': None, 'lat': Decimal('62.015167236328125'), 'lon': Decimal('-6.758638858795166'), 'pos': 0, 'type': 'Polygon'}

Slide 38

Slide 38 text

MEZA DEMO (EXTRACT BY ID V2) >>> clean = ( ... r for r in records if r.get('NAME')) >>> >>> splits = pr.split( ... clean, 'NAME', chunksize=1024) >>> >>> b_splits = ( ... s for s in splits if 'BASE' in s[1]) >>> >>> name = 'base_cities.csv'

Slide 39

Slide 39 text

MEZA DEMO (EXTRACT BY ID V2) >>> for pos, split in enumerate(b_splits): ... f = cv.records2csv( ... split[0], skip_header=pos) ... ... io.write(name, f, mode='ab+')

Slide 40

Slide 40 text

$ csvstat base_cities.csv | tail -n12 6. "NAME" Unique values: 4 Most common values: BASEL (102x) KABASELE-PANIA (23x) MATSUBASE (17x) WABBASEKA (10x) Row count: 152 MEZA DEMO (EXTRACT BY ID)

Slide 41

Slide 41 text

MEZA DEMO (EXTRACT BY ID) $ csvcut -c NAME,lon,lat base_cities.csv \ | csvlook --max-rows 3 | NAME | lon | lat | | ----- | ------ | ------- | | BASEL | 7.549… | 47.544… | | BASEL | 7.544… | 47.545… | | BASEL | 7.539… | 47.547… | | ... | ... | ... |

Slide 42

Slide 42 text

Reuben Cummings @reubano THANKS!