Elastic Stream Processing for the Internet of Things

Elastic Stream Processing for the Internet of Things Christoph Hochreiner,
Michael Vögler, Stefan Schulte, Schahram Dustdar

Motivation 2

Motivational Scenario Challenges • Raw Sensor Data must not be
exported to other countries due to legal reasons • Analysis algorithm must not be hosted outside the companies premises • System needs to be configurable • Data should be processed next to the sensor to reduce latency 3

Motivational Scenario Solution Decompose the stream processing system into microservices
4

Requirements • Inherent hybrid cloud support to consider legal and
business related regulations • Reconfiguration at runtime • Computational resource elasticity • Cost efficiency 5

Related Systems 6 System S Storm Spark Cloud Data Flow
Stream Cloud Distributed Storm Cloud Dataflow AWS IOT Hybrid Cloud Support ✔ ✔ ✔ ✔ Reconfiguration at Runtime (✔) Resource Elasticity at Runtime (✔) ✔ ✔ (✔) ✔ ✔ Cost Efficiency (✔) (✔)

System Design 7

Stream Processing Topology 8 IoT Device Processing Operation User

System Design 9 Operator Node Processing Node

System Design 10

Evaluation 11

Evaluation Scenario Objective Analyze individual taxi rides, which are composed
of location-based time-series 12 Data Transfer Data Aggregation Processing operation Distance Speed Average Speed Aggregation Analysis Monitor

Evaluation Preliminaries Service Level Agreement Report Generation Time Maximal processing
duration after the last time-series item was posted until the analysis is finished is 60 seconds. Node granularity Each Operator Node and Processing Node is presented by a virtual machine. 13

Evaluation Preliminaries Resource Provisioning Approaches Elastic-provisioning Threshold-based resource allocation approach
based on the CPU load of the Processing Nodes as well as the load on the incoming message queue. Under-provisioning Fixed provisioning of Processing Nodes which just do not comply with any SLA. Over-provisioning Fixed provisioning of a minimal set of Processing Nodes, which yield a 100 % SLA compliance. Node allocation for baselines Node assignment for baselines is indentical as for the elastic scenario. 14

Evaluation Results 15 Elastic Provisioning Under provisioning Over Provisioning Cost
for Processing Nodes 2160,66 1855 2665 Total Makespan (sec) 6653 6975 6655 Average Report Generation (sec) 77 355 35 Total Delays 21 75 0 SLA Adherance (%) 28 0 100

for Processing Nodes 2160,66 1855 2665 Total Makespan (sec) 6653 6975 6655 Average Report Generation (sec) 77 355 35 Total Delays 21 75 0 SLA Adherance (%) 28 0 100 20 % cost reduction compared to over-provisioning

for Processing Nodes 2160,66 1855 2665 Total Makespan (sec) 6653 6975 6655 Average Report Generation (sec) 77 355 35 Total Delays 21 75 0 SLA Adherance (%) 28 0 100 total makespan is similar as for the over-provisioning scenario and 5 % faster than the underprovisioning

for Processing Nodes 2160,66 1855 2665 Total Makespan (sec) 6653 6975 6655 Average Report Generation (sec) 77 355 35 Total Delays 21 75 0 SLA Adherance (%) 28 0 100 Average report generation is 4.3 times faster than for the underprovisioning scenario only 2 times as for the over-provisioning one

for Processing Nodes 2160,66 1855 2665 Total Makespan (sec) 6653 6975 6655 Average Report Generation (sec) 77 355 35 Total Delays 21 75 0 SLA Adherance (%) 28 0 100 Average report generation duration is slightly above the SLA

Evaluation Results 20

Evaluation Results 21

Conclusion 22

Requirements Revisited • Inherent hybrid cloud support to consider legal
and business related regulations • Reconfiguration at runtime • Computational resource elasticity • Cost efficiency 23 ✔ ✔ ✔ ✔

Lessons Learned • Threshold-based resource allocation can lead to delays
and impact the QoS negatively • VM based provisioning causes delays due to the long startup duration • Redundant infrastructure of Operator Nodes cause high computational resource requirements 24

Outlook • Investigate towards predictive scheduling approaches • Implement a
more lightweight system design • Pool the Operator Node infrastructure to reduce the computational overhead 25 https://github.com/chochreiner/VISP-Runtime

Q & A Christoph Hochreiner [email protected]

Backup Slides 27

Elastic Resource Provisioning min X p2P pi + X p2P
piBT U + u · N + d · N pi piBT U u d Specific processing node Remaining BTU for a specific processing Node Upscaling decision variable Downscaling decision variable

Elastic Stream Processing for the Internet of T...

Elastic Stream Processing for the Internet of Things

Christoph Hochreiner

More Decks by Christoph Hochreiner

Other Decks in Research

Featured

Transcript