Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to analyze and visualize geo-data with the Elastic Stack

Elastic Co
June 26, 2017
220

How to analyze and visualize geo-data with the Elastic Stack

Talk by Thomas Neirynck and Brandon Kobel at Code PaLOUsa on June 8, 2017.

Elastic Co

June 26, 2017
Tweet

Transcript

  1. 2 What is the Elastic Stack? • Store and search

    data with Elasticsearch • Move data into Elasticsearch with − Logstash − Beats • Visualize data and administer the stack with Kibana
  2. 3 What is the Elastic Stack used for? • Document

    search − Support for multiple languages • Log analytics − Server logs, application usage, time-based data • System monitoring − Real time health watches
  3. 4 What will we do in this presentation? • Full

    round-trip − Ingest data into Elasticsearch with Logstash − Build Kibana application to generate insights • Pay attention to geo-features across stack • … and enrich analytical experience with machine learning Build application to analyze traffic accident data in NYC
  4. 5 The data source • NYC traffic accident data −

    https://opendata.cityofnewyork.us/ − +1,000,0000 traffic incidents, since July 2012 • Tabular format − Fields indicate where and when, number of injuries and fatalities, type of vehicles involved
  5. 6 … but Elasticsearch requires JSON documents • This document

    must conform to a `mapping`. − Field-values have to correspond to a datatype (date, numbers, text, ...) − Mapping informs how values are indexed at ingest-time (and this impacts if/how they can be searched for at query-time)
  6. 7 Field datatypes for geo-data • geo_point − Several representations

    • Numeric: Object with lon/lat keys, array with two numbers • String: Lon-lat string, Geohash − Supported by Kibana • geo_shape − Simple Feature data model (point, line, polygon, collections), envelopes, circles − Not supported by Kibana
  7. 8 Using Logstash for ingestion • What: − Transform data

    (e.g. tabular format → JSON document) − Ensure field values conform to the mapping − Store documents in Elasticsearch • `Pipeline` − Data source is a stream of events − Series of steps to transform these events (filters) − Configuration of this pipe is programmable
  8. 9 ... filter { csv { Columns => ["date","time","borough","zip_code","latitude","longitude", ...]

    } ... #If the event contains latitude and lon if [latitude] and [longitude] { mutate { convert => {"latitude" => "float"} } mutate { convert => {"longitude" => "float"} } mutate { rename => {"latitude" => "[coords][lat]"} } mutate { rename => {"longitude" => "[coords][lon]"} } } "properties" : { "coords" : { "type" : "geo_point" }, "_source": { ... "coords": { "lon": -73.825516, "lat": 40.753 }, ... } > cat sourcedata.csv | /path/to/logstash/bin/logstash -f logstash.conf
  9. 10 Kibana is window into the Elastic Stack • Index

    Patterns − Points Kibana to one or more indices in Elasticsearch that share the same mappings − Manage • Time-based values • Formatting of values for display • Scripted fields for calculating values at query-time (<> logstash transformation at ingest-time)
  10. 11 Kibana Visualizations • Use the Elasticsearch _search API −

    REST-API with JSON-base query language − Can aggregate results (similar to “group by” in SQL) Kibana Visualizations display the result of aggregations, not the values of individual documents − This scales better − Different data-types have different type of roll-up • e.g. • seconds, minutes, hours for date values • ranges for number values
  11. 12 • Uses the “geohash” grid aggregation − string-hash of

    a location, with a notion of precision/scale − Corresponds to a grid-cell area on the earth • dng18 +- 1.5 mile error • dng18e8w +- 50ft error • Uses geo-centroid positioning − Weighted center of location of all the results in the geohash grid Coordinate Map Visualization
  12. 13 Detour: the Elastic Geo Service and X-pack • Default

    map service and data used by Kibana − road map image service − example boundary data (world countries and US States) • Requires X-Pack install for access to all zoom levels • Link outside services to Kibana − image services • TMS: http://my.map.service/{z}/{x}/{y}.png • OGC-WMS − geojson boundary data
  13. 14 Region Map Visualization • Create choropleth maps • Inner

    join of results of “terms” aggregation with reference shape data • Link custom data-service in config/kibana.yml (requires CORS support) regionmap: layers: - name: "NYC Boroughs (self-hosted)" url: "http://localhost/region_map/data/nyc_boroughs.json" fields: - name: "name" description: "Borough Name" - name: "NYC Council districts (self-hosted)" url: "http://localhost/region_map/data/nyc_councildistricts.json" fields: - name: "CounsilDist" description: "District #"
  14. 16 Elastic Cloud Hosted Elasticsearch and Kibana Latest versions of

    Elasticsearch and Kibana One-click scaling and upgrading; no downtime Built-in security (auth, encryption, role-access) Option for dedicated SLA support and X-Pack The only offering created and managed by Elastic Free Kibana and backups every 30 minutes