$30 off During Our Annual Pro Sale. View Details »

Geospatial Analysis Made Easy with meza

Geospatial Analysis Made Easy with meza

This talk was given at GeoPython and is about using meza for GeoJSON analysis. Code used in this talk is available at https://gist.github.com/reubano/5ba3a3b850fe4c1e5ee497f325111ba0.

Reuben Cummings

May 10, 2017
Tweet

More Decks by Reuben Cummings

Other Decks in Programming

Transcript

  1. GEOSPATIAL ANALYSIS MADE
    EASY WITH MEZA
    GeoPython — Basel, Switzerland — May 10, 2017
    by Reuben Cummings
    @reubano

    View Slide

  2. WHO AM I?
    Managing Director, Nerevu Development
    Founder of Arusha Coders
    Author of several popular Python packages

    View Slide

  3. ME ZA ( GI TH UB .C OM / R E UB A NO /M E ZA )

    View Slide

  4. readers converters
    MEZA OVERVIEW
    records
    input output

    View Slide

  5. readers converters
    MEZA OVERVIEW
    records
    input output

    View Slide

  6. readers converters
    MEZA OVERVIEW
    records
    input output

    View Slide

  7. MEZA INPUT/OUTPUT
    Input Formats Output Formats
    Array
    CSV
    GeoJSON
    JSON
    GeoJSON
    MDB
    CSV/TSV
    SQLITE
    DBF
    XLS(X)
    JSON YAML
    HTML

    View Slide

  8. MT. K I L IMA NJ AR O (M OS HI , TAN ZA N I A )
    Photo Credit: Reuben Cummings

    View Slide

  9. {
    "type": "FeatureCollection",
    "features": [
    {
    "type": "Feature",
    "properties": {
    "peak": "uhuru",
    "id": 10
    },
    "geometry": {
    UHURU_PEAK.GEOJSON

    View Slide

  10. "type": "Point",
    "coordinates": [
    37.350666,
    -3.066465
    ]
    }
    }
    ]
    }
    UHURU_PEAK.GEOJSON

    View Slide

  11. {
    "type": "FeatureCollection",
    "features": [
    {
    "type": "Feature",
    "properties": {
    "peak": "kibo",
    "id": 11
    },
    "geometry": {
    KIBO_PEAK.GEOJSON

    View Slide

  12. "type": "Point",
    "coordinates": [
    37.353333,
    -3.075833
    ]
    }
    }
    ]
    }
    KIBO_PEAK.GEOJSON

    View Slide

  13. MEZA DEMO

    View Slide

  14. >>> from meza import io
    >>>
    >>> records = io.read('kibo_peak.geojson')
    >>> next(records)
    {'id': 11,
    'lat': Decimal('-3.075833'),
    'lon': Decimal('37.353333'),
    'peak': 'kibo',
    'type': 'Point'}
    MEZA DEMO (READERS)

    View Slide

  15. CHALLENGE #1 MERGING

    View Slide

  16. CHALLENGE #1 MERGING

    View Slide

  17. MEZA DEMO (MERGING)
    >>> from meza import convert as cv
    >>>
    >>> paths = (
    ... 'uhuru_peak.geojson',
    ... 'kibo_peak.geojson')
    >>>
    >>> records = io.join(*paths)
    >>> geojson = cv.records2geojson(records)
    >>> io.write('meza_peaks.geojson', geojson)

    View Slide

  18. {
    "type": "FeatureCollection",
    "bbox": [
    37.350666,
    -3.075833,
    37.353333,
    -3.066465
    ],
    "features": [
    {
    MEZA_PEAKS.GEOJSON

    View Slide

  19. "type": "Feature",
    "id": 10,
    "geometry": {
    "type": "Point",
    "coordinates": [
    37.350666,
    -3.066465
    ]
    },
    "properties": {
    MEZA_PEAKS.GEOJSON

    View Slide

  20. "id": 10,
    "peak": "uhuru"
    }
    },
    {
    "type": "Feature",
    "id": 11,
    "geometry": {
    "type": "Point",
    MEZA_PEAKS.GEOJSON

    View Slide

  21. "coordinates": [
    37.353333,
    -3.075833
    ]
    },
    "properties": {
    "id": 11,
    "peak": "kibo"
    }
    }
    MEZA_PEAKS.GEOJSON

    View Slide

  22. ],
    "crs": {
    "type": "name",
    "properties": {
    "name": "urn:ogc:def:crs:OGC:1.3:CRS84"
    }
    }
    }
    MEZA_PEAKS.GEOJSON

    View Slide

  23. >>> records = io.read('meza_peaks.geojson')
    >>> csv = cv.records2csv(records)
    >>> io.write('meza_peaks.csv', csv)
    MEZA DEMO (MERGING)
    $ pip install --user csvkit
    $ csvlook meza_peaks.csv
    | id | type | lat | lon | peak |
    | -- | ----- | ------- | ------- | ----- |
    | 10 | Point | -3.066… | 37.350… | uhuru |
    | 11 | Point | -3.075… | 37.353… | kibo |

    View Slide

  24. CHALLENGE #2 SPLIT BY ID

    View Slide

  25. CHALLENGE #2 SPLIT BY ID

    View Slide

  26. >>> for _id, _records in groups:
    ... f = cv.records2geojson(_records)
    ... io.write(name.format(_id), f)
    >>> from meza import process as pr
    >>>
    >>> records = io.read('meza_peaks.geojson')
    >>> groups = pr.group(records, 'id')
    >>> name = 'peak_{}.geojson'
    >>>
    MEZA DEMO (SPLIT BY ID)

    View Slide

  27. $ ls peak_*
    peak_10.geojson peak_11.geojson
    MEZA DEMO (SPLIT BY ID)

    View Slide

  28. CHALLENGE #3 EXTRACT BY ID

    View Slide

  29. CHALLENGE #3 EXTRACT BY ID

    View Slide

  30. >>> records = io.read('peaks.geojson')
    >>> groups = pr.group(records, 'id')
    >>> group = next(
    ... g for g in groups if g[0] == 11)
    >>>
    MEZA DEMO (EXTRACT BY ID)

    View Slide

  31. >>> geojson = cv.records2csv(group[1])
    >>> io.write('id_11_peaks.csv', geojson)
    >>> records = io.read('peaks.geojson')
    >>> groups = pr.group(records, 'id')
    >>> group = next(
    ... g for g in groups if g[0] == 11)
    >>>
    MEZA DEMO (EXTRACT BY ID)

    View Slide

  32. $ csvlook id_11_peaks.csv
    | id | type | lat | lon | peak |
    | -- | ----- | ------- | ------- | ---- |
    | 11 | Point | -3.076… | 37.353… | kibo |
    MEZA DEMO (EXTRACT BY ID)

    View Slide

  33. BUT WAIT,
    THERE'S MORE!
    ME ZA D E MO

    View Slide

  34. CHALLENGE #4 EXTRACT BY ID V2

    View Slide

  35. CHALLENGE #4 EXTRACT BY ID V2

    View Slide

  36. MEZA DEMO (EXTRACT BY ID V2)
    >>> from urllib.request import urlopen
    >>>
    >>> BASE = 'https://raw.githubusercontent.com'
    >>> REPO = 'drei01/geojson-world-cities'
    >>> path = '{}/{}/master/cities.geojson'
    >>> url = path.format(BASE, REPO)
    >>> f = urlopen(url)
    >>> records = io.read_geojson(f)

    View Slide

  37. MEZA DEMO (EXTRACT BY ID V2)
    >>> next(records)
    {'NAME': 'TORSHAVN',
    'id': None,
    'lat': Decimal('62.015167236328125'),
    'lon': Decimal('-6.758638858795166'),
    'pos': 0,
    'type': 'Polygon'}

    View Slide

  38. MEZA DEMO (EXTRACT BY ID V2)
    >>> clean = (
    ... r for r in records if r.get('NAME'))
    >>>
    >>> splits = pr.split(
    ... clean, 'NAME', chunksize=1024)
    >>>
    >>> b_splits = (
    ... s for s in splits if 'BASE' in s[1])
    >>>
    >>> name = 'base_cities.csv'

    View Slide

  39. MEZA DEMO (EXTRACT BY ID V2)
    >>> for pos, split in enumerate(b_splits):
    ... f = cv.records2csv(
    ... split[0], skip_header=pos)
    ...
    ... io.write(name, f, mode='ab+')

    View Slide

  40. $ csvstat base_cities.csv | tail -n12
    6. "NAME"
    Unique values: 4
    Most common values: BASEL (102x)
    KABASELE-PANIA (23x)
    MATSUBASE (17x)
    WABBASEKA (10x)
    Row count: 152
    MEZA DEMO (EXTRACT BY ID)

    View Slide

  41. MEZA DEMO (EXTRACT BY ID)
    $ csvcut -c NAME,lon,lat base_cities.csv \
    | csvlook --max-rows 3
    | NAME | lon | lat |
    | ----- | ------ | ------- |
    | BASEL | 7.549… | 47.544… |
    | BASEL | 7.544… | 47.545… |
    | BASEL | 7.539… | 47.547… |
    | ... | ... | ... |

    View Slide

  42. Reuben Cummings
    @reubano
    THANKS!

    View Slide