Slide 1

Slide 1 text

1 Thomas Neirynck, Brandon Kobel [email protected], [email protected] How to analyze and visualize geo-data with the Elastic Stack

Slide 2

Slide 2 text

2 What is the Elastic Stack? • Store and search data with Elasticsearch • Move data into Elasticsearch with − Logstash − Beats • Visualize data and administer the stack with Kibana

Slide 3

Slide 3 text

3 What is the Elastic Stack used for? • Document search − Support for multiple languages • Log analytics − Server logs, application usage, time-based data • System monitoring − Real time health watches

Slide 4

Slide 4 text

4 What will we do in this presentation? • Full round-trip − Ingest data into Elasticsearch with Logstash − Build Kibana application to generate insights • Pay attention to geo-features across stack • … and enrich analytical experience with machine learning Build application to analyze traffic accident data in NYC

Slide 5

Slide 5 text

5 The data source • NYC traffic accident data − https://opendata.cityofnewyork.us/ − +1,000,0000 traffic incidents, since July 2012 • Tabular format − Fields indicate where and when, number of injuries and fatalities, type of vehicles involved

Slide 6

Slide 6 text

6 … but Elasticsearch requires JSON documents • This document must conform to a `mapping`. − Field-values have to correspond to a datatype (date, numbers, text, ...) − Mapping informs how values are indexed at ingest-time (and this impacts if/how they can be searched for at query-time)

Slide 7

Slide 7 text

7 Field datatypes for geo-data • geo_point − Several representations ● Numeric: Object with lon/lat keys, array with two numbers ● String: Lon-lat string, Geohash − Supported by Kibana • geo_shape − Simple Feature data model (point, line, polygon, collections), envelopes, circles − Not supported by Kibana

Slide 8

Slide 8 text

8 Using Logstash for ingestion • What: − Transform data (e.g. tabular format → JSON document) − Ensure field values conform to the mapping − Store documents in Elasticsearch • `Pipeline` − Data source is a stream of events − Series of steps to transform these events (filters) − Configuration of this pipe is programmable

Slide 9

Slide 9 text

9 ... filter { csv { Columns => ["date","time","borough","zip_code","latitude","longitude", ...] } ... #If the event contains latitude and lon if [latitude] and [longitude] { mutate { convert => {"latitude" => "float"} } mutate { convert => {"longitude" => "float"} } mutate { rename => {"latitude" => "[coords][lat]"} } mutate { rename => {"longitude" => "[coords][lon]"} } } "properties" : { "coords" : { "type" : "geo_point" }, "_source": { ... "coords": { "lon": -73.825516, "lat": 40.753 }, ... } > cat sourcedata.csv | /path/to/logstash/bin/logstash -f logstash.conf

Slide 10

Slide 10 text

10 Kibana is window into the Elastic Stack • Index Patterns − Points Kibana to one or more indices in Elasticsearch that share the same mappings − Manage ● Time-based values ● Formatting of values for display ● Scripted fields for calculating values at query-time (<> logstash transformation at ingest-time)

Slide 11

Slide 11 text

11 Kibana Visualizations • Use the Elasticsearch _search API − REST-API with JSON-base query language − Can aggregate results (similar to “group by” in SQL) Kibana Visualizations display the result of aggregations, not the values of individual documents − This scales better − Different data-types have different type of roll-up ● e.g. ● seconds, minutes, hours for date values ● ranges for number values

Slide 12

Slide 12 text

12 • Uses the “geohash” grid aggregation − string-hash of a location, with a notion of precision/scale − Corresponds to a grid-cell area on the earth ● dng18 +- 1.5 mile error ● dng18e8w +- 50ft error • Uses geo-centroid positioning − Weighted center of location of all the results in the geohash grid Coordinate Map Visualization

Slide 13

Slide 13 text

13 Detour: the Elastic Geo Service and X-pack • Default map service and data used by Kibana − road map image service − example boundary data (world countries and US States) • Requires X-Pack install for access to all zoom levels • Link outside services to Kibana − image services ● TMS: http://my.map.service/{z}/{x}/{y}.png ● OGC-WMS − geojson boundary data

Slide 14

Slide 14 text

14 Region Map Visualization • Create choropleth maps • Inner join of results of “terms” aggregation with reference shape data • Link custom data-service in config/kibana.yml (requires CORS support) regionmap: layers: - name: "NYC Boroughs (self-hosted)" url: "http://localhost/region_map/data/nyc_boroughs.json" fields: - name: "name" description: "Borough Name" - name: "NYC Council districts (self-hosted)" url: "http://localhost/region_map/data/nyc_councildistricts.json" fields: - name: "CounsilDist" description: "District #"

Slide 15

Slide 15 text

Kibana - Dashboard Demo

Slide 16

Slide 16 text

16 Elastic Cloud Hosted Elasticsearch and Kibana Latest versions of Elasticsearch and Kibana One-click scaling and upgrading; no downtime Built-in security (auth, encryption, role-access) Option for dedicated SLA support and X-Pack The only offering created and managed by Elastic Free Kibana and backups every 30 minutes

Slide 17

Slide 17 text

QUESTIONS?????????? @elastic www.elastic.co