Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to analyze and visualize geo-data with the Elastic Stack

Elastic Co
June 26, 2017
210

How to analyze and visualize geo-data with the Elastic Stack

Talk by Thomas Neirynck and Brandon Kobel at Code PaLOUsa on June 8, 2017.

Elastic Co

June 26, 2017
Tweet

More Decks by Elastic Co

Transcript

  1. 2 What is the Elastic Stack? • Store and search

    data with Elasticsearch • Move data into Elasticsearch with − Logstash − Beats • Visualize data and administer the stack with Kibana
  2. 3 What is the Elastic Stack used for? • Document

    search − Support for multiple languages • Log analytics − Server logs, application usage, time-based data • System monitoring − Real time health watches
  3. 4 What will we do in this presentation? • Full

    round-trip − Ingest data into Elasticsearch with Logstash − Build Kibana application to generate insights • Pay attention to geo-features across stack • … and enrich analytical experience with machine learning Build application to analyze traffic accident data in NYC
  4. 5 The data source • NYC traffic accident data −

    https://opendata.cityofnewyork.us/ − +1,000,0000 traffic incidents, since July 2012 • Tabular format − Fields indicate where and when, number of injuries and fatalities, type of vehicles involved
  5. 6 … but Elasticsearch requires JSON documents • This document

    must conform to a `mapping`. − Field-values have to correspond to a datatype (date, numbers, text, ...) − Mapping informs how values are indexed at ingest-time (and this impacts if/how they can be searched for at query-time)
  6. 7 Field datatypes for geo-data • geo_point − Several representations

    • Numeric: Object with lon/lat keys, array with two numbers • String: Lon-lat string, Geohash − Supported by Kibana • geo_shape − Simple Feature data model (point, line, polygon, collections), envelopes, circles − Not supported by Kibana
  7. 8 Using Logstash for ingestion • What: − Transform data

    (e.g. tabular format → JSON document) − Ensure field values conform to the mapping − Store documents in Elasticsearch • `Pipeline` − Data source is a stream of events − Series of steps to transform these events (filters) − Configuration of this pipe is programmable
  8. 9 ... filter { csv { Columns => ["date","time","borough","zip_code","latitude","longitude", ...]

    } ... #If the event contains latitude and lon if [latitude] and [longitude] { mutate { convert => {"latitude" => "float"} } mutate { convert => {"longitude" => "float"} } mutate { rename => {"latitude" => "[coords][lat]"} } mutate { rename => {"longitude" => "[coords][lon]"} } } "properties" : { "coords" : { "type" : "geo_point" }, "_source": { ... "coords": { "lon": -73.825516, "lat": 40.753 }, ... } > cat sourcedata.csv | /path/to/logstash/bin/logstash -f logstash.conf
  9. 10 Kibana is window into the Elastic Stack • Index

    Patterns − Points Kibana to one or more indices in Elasticsearch that share the same mappings − Manage • Time-based values • Formatting of values for display • Scripted fields for calculating values at query-time (<> logstash transformation at ingest-time)
  10. 11 Kibana Visualizations • Use the Elasticsearch _search API −

    REST-API with JSON-base query language − Can aggregate results (similar to “group by” in SQL) Kibana Visualizations display the result of aggregations, not the values of individual documents − This scales better − Different data-types have different type of roll-up • e.g. • seconds, minutes, hours for date values • ranges for number values
  11. 12 • Uses the “geohash” grid aggregation − string-hash of

    a location, with a notion of precision/scale − Corresponds to a grid-cell area on the earth • dng18 +- 1.5 mile error • dng18e8w +- 50ft error • Uses geo-centroid positioning − Weighted center of location of all the results in the geohash grid Coordinate Map Visualization
  12. 13 Detour: the Elastic Geo Service and X-pack • Default

    map service and data used by Kibana − road map image service − example boundary data (world countries and US States) • Requires X-Pack install for access to all zoom levels • Link outside services to Kibana − image services • TMS: http://my.map.service/{z}/{x}/{y}.png • OGC-WMS − geojson boundary data
  13. 14 Region Map Visualization • Create choropleth maps • Inner

    join of results of “terms” aggregation with reference shape data • Link custom data-service in config/kibana.yml (requires CORS support) regionmap: layers: - name: "NYC Boroughs (self-hosted)" url: "http://localhost/region_map/data/nyc_boroughs.json" fields: - name: "name" description: "Borough Name" - name: "NYC Council districts (self-hosted)" url: "http://localhost/region_map/data/nyc_councildistricts.json" fields: - name: "CounsilDist" description: "District #"
  14. 16 Elastic Cloud Hosted Elasticsearch and Kibana Latest versions of

    Elasticsearch and Kibana One-click scaling and upgrading; no downtime Built-in security (auth, encryption, role-access) Option for dedicated SLA support and X-Pack The only offering created and managed by Elastic Free Kibana and backups every 30 minutes