How Elasticsearch is SPARKing Our Geospatial Analysis: An Esri Story

How Elasticsearch is SPARKing Our Geospatial Analysis: An Esri Story
Adam Mollenkopf, Real-Time GIS Capability Lead, Esri @amollenkopf [email protected] 1

Esri Geographic Information System (GIS) • Environmental Systems Research Institute
(ESRI) was founded in 1969 • Esri develops GIS software • Global Company with over 350,000 user organizations worldwide Headquarters in Redlands, CA 80 Esri distributors worldwide 2

How Elasticsearch is SPARKing our Geospatial Analysis agenda • Use
Cases • Real-Time Ingestion • Streaming Analytics • Storage & Search • Visualization • Batch Analytics 3

Spatiotemporal Observation Data data-in-motion use cases Desktop Web Device Visualization
Spatiotemporal Storage & Search Streaming Analytics Batch Analytics Ingestion • Moving Objects: - Aircraft, Drones, Trucks, Cars, Railways, Vessels, People, … • Sensor Networks: - Weather Stations, Road Traffic, Gas & Electric Utility Networks, Environmental Sensors, … 4

TODO: INSERT “02” VIDEO HERE

Ingestion  of high velocity spatiotemporal data 6

Ingestion of high velocity spatiotemporal data • Requirements: - Sustain
a single node ingestion throughput of at least tens of thousands of events per second. - Achieve near linear scalability of throughput when adding additional nodes. - Gracefully handle bursty data. spatiotemporal observation data Ingestion 7

Apache Kafka publish-subscribe messaging rethought as a distributed commit log
• Fast - single broker can handle hundreds of MBs of reads and writes per second. • Scalable - data streams are partitioned and spread over a cluster of machines. • Durable - messages are persisted to disk and replicated within the cluster. • Distributed - cluster-centric design that offers strong durability and fault-tolerance guarantees. 8

Apache Spark a fast and general engine for large-scale data
processing • Unified big data processing: - write streaming jobs the same way you write batch jobs. - can combine streaming with batch and interactive queries. • Spark apps can be written in Java, Scala, Python, and R. 9

of high velocity spatiotemporal data c4.2xlarge (Windows 2012 Server R2):
8 vCPU, 15 GiB, 100GB SSD, 1,000 Mbps EBS Ingestion: 1 node benchmark Ingestion 1 node Spark Streaming w/ Kafka 132k 10

Ingestion: 2 node benchmark Ingestion 1 node 2 node Spark
Streaming w/ Kafka 132k 282k of high velocity spatiotemporal data c4.2xlarge (Windows 2012 Server R2): 8 vCPU, 15 GiB, 100GB SSD, 1,000 Mbps EBS 11

Streaming Analytics  on high velocity & volume spatiotemporal data 12

Streaming Analytics of high velocity & volume spatiotemporal data •
Configure the flow of events, - the filtering and analytic steps to perform, - what ingestion stream(s) to apply them to, - and where to send the results. spatiotemporal observation data Streaming . Analytics . Ingestion 13

of high velocity & volume spatiotemporal data Streaming Analytics KafkaUtils.createStream(ssc,
…) .map( event => SlidingTimeWindow.tumble(event, …) ) .map( event => Aggregator.spatialAggregation(event, …) ) .map( event => MapService.density(event, …) ) .saveToEs(…) => DAG (Directed Acyclic Graph) • Configure the flow of events, - the filtering and analytic steps to perform, - what ingestion stream(s) to apply them to, - and where to send the results. 14

Streaming Analytics of high velocity & volume spatiotemporal data •
Run continuous analytics on high velocity spatiotemporal data-in-motion. Spatial Aggregation with a Sliding Time Window 30 meter cells Spatial Aggregation 200 meter cells 15

GIS Tools for Hadoop http://esri.github.io/gis-tools-for-hadoop/ • Esri Geometry API for
Java: - Geometry objects: points, lines, polygons. - Spatial relations: intersects, touches, overlaps, … - Spatial operations: buffer, cut, union, … • Spatial Framework for Hadoop - Includes Spatial UDFs (User Defined Functions). • GeoProcessing Tools for Hadoop Ch. 8 Geospatial & Temporal Data Analysis 16

Storage & Search  of high volume spatiotemporal data 17

Storage & Search of high volume spatiotemporal data • Requirements:
- Sustain a single-node write throughput of at least tens of thousands of events per second. - Achieve growth in volume capacity & write throughput when adding additional nodes. Spatiotemporal Storage & Search . Streaming . Analytics . Ingestion 18

Elasticsearch search & analyze data in real time • Distributed,
scalable, and highly available. • Simple, yet sophisticated, RESTful API. • Real-time full-text search, structured search, and analytic capabilities. • Has the ability to easily combine Geolocation with search and analytic capabilities. • Spark Elasticsearch Connector: - https://github.com/elastic/elasticsearch-hadoop (org.elasticsearch.spark.rdd.EsSpark) 19

of high volume spatiotemporal data c4.2xlarge (Windows 2012 Server R2):
8 vCPU, 15 GiB, 100GB SSD, 1,000 Mbps EBS Storage 1 node 2 node 3 node 4 node 5 node {es} 106k 143k 192k 224k 249k Storage & Search: 5 Node Elasticsearch Cluster Write Throughput Ingest 1 node 2 node Spark + Kafka 132k 282k 20

Searching high volume spatiotemporal data • Efficiently access and search
a large volume of spatiotemporal data. - Query by any combination of id, time, space, and attributes. • Elasticsearch has the ability to easily combine Geolocation with structured & full-text search. 21

Searching high volume spatiotemporal data • Geolocation search is made
possible via spatial field types: - geo_point: a latitude-longitude pair - can calculate distance; used for sorting and relevance. - can be filtered by geo_bounding_box, geo_distance, or geo_distance_range. - can be aggregated into a grid to display on a map; uses Geohash. - geo_shape: complex shapes including polygon and polyline - used purely for filtering; expressed as GeoJSON. - For more info see: https://www.elastic.co/guide/en/elasticsearch/guide/current/geoloc.html 22

Visualization  of high velocity & volume spatiotemporal data 23

Desktop Web Device Visualization Spatiotemporal Storage & Search . Streaming
. Analytics . Ingestion • ArcGIS API for JavaScript - A lightweight way to embed maps in web apps. - Renders any Map or Feature Service compliant source. https://www.esri.com/library/whitepapers/pdfs/geoservices-rest-spec.pdf of high velocity & volume spatiotemporal data Visualization 24

Visualization of high velocity & volume spatiotemporal data • Render
with ability to do aggregation - Aggregations calculated at various levels of detail and are specific to each user session. - when zoomed in raw observations are returned and rendered. 25

Visualization of high velocity & volume spatiotemporal data

Batch Analytics  of high velocity & volume spatiotemporal data 28

Desktop Web Device Visualization Spatiotemporal Storage & Search . Streaming
. Analytics . . Batch . Analytics Ingestion of high volume spatiotemporal data Batch Analytics 29

Batch Analytics of high volume spatiotemporal data

Port of Rotterdam, courtesy of Frank Cremer vessel and port
usage behavioral analytics • 8th largest port in the world. • Largest port in Europe. 31

Polyline Track Batch Analytic Tool Speed Batch Analytic Tool Line
Crosses Batch Analytic Tool Density Batch Analytic Tool Port of Rotterdam vessel and port usage behavioral analytics 32

Port of Rotterdam polyline track analytics 33

Port of Rotterdam polyline track analytics 34

Port of Rotterdam density analytics 35

D d Δ (Lat,lon) Where is Δ≃ 0 ? Port
of Rotterdam dredging prioritization 36

Port of Rotterdam dredging prioritization 37

How Elasticsearch is SPARKing our Geospatial Analysis summary • When
working with high velocity & volume spatiotemporal data we have found the best technology selections are as follows: - Real-Time Ingestion = Spark Streaming + Kafka. - Streaming Analytics = Spark Streaming + GIS Tools for Hadoop. - Storage & Search = Elasticsearch + Spark Elasticsearch Connector. - Visualization = ArcGIS API for JavaScript. - Batch Analytics = Spark Elasticsearch Connector + Spark Core + GIS Tools for Hadoop. - GIS Tools for Hadoop - Can be used as a basis to add spatial geometries, relations, and operators to Spark. http://esri.github.io/gis-tools-for-hadoop/ 38

Q & A 39 Thank you!

How Elasticsearch is SPARKing Our Geospatial An...

How Elasticsearch is SPARKing Our Geospatial Analysis: An Esri Story

More Decks by Elastic Co

Other Decks in Technology

Featured

Transcript