A serverless data pipeline for Insurance Telematics

Build a serverless data pipeline for Insurance Telematics HPC for
Industry Workshop Milan 21/05/2019 and sleep at night!

“ How hard is it to ingest hundreds of millions
messages per day and get real-time insights? 2

On-premise solution 3 Devices send streaming data (GPS, acceleration) Kafka
buﬀers data in pub/sub topics to be consumed Spark jobs filter aggregate or transform data HDFS cluster stores data for analytics or further processing

4 Image credits: https://www.edureka.co/blog/hadoop-ecosystem

Hello! I am Francesco Lerro Delivering highly available solutions with
millions interactions since 2005 In love with cloud computing and buﬀalo mozzarella 6 @flerro

A factory which develops data-intensive solutions, applications or components, pursuing
following goals: • preserve/extract value/enrich existing data • push forward services innovation/digital transformation • investigate future technology scenarios for Insurance 7 We are an InsurTech inside

Data Science Data mining Big Data Machine Learning Image processing
Natural Language Processing Computer Science SW Engineering Data Engineering Big Data architectures High Perf Computing IoT & Mobile Signal processing #WeAreHiring Data Scientists and SW Engineers

10.000.000 9 Vehicles with UnipolSai insurance 4.000.000 Black-box installed on
vehicles 150.000.000 Events produced daily

Limits of on-premise • Complex system with high operational costs
• Data ingestion delays on high usage peaks • Jobs for data analysis may take too long • Analysis of ingested data not always easy 10

Serverless is a Paradigm Shift • Automated high availability •
Flexible scalability • Pay for what you use • Focus on your business 12

Amazon IoT Core • Managed platform to handle IoT devices
• Provides a rule-engine to build an “IoT application” • Routes incoming messages to other AWS services (Kinesis, Lambda, ...) 13

Amazon Kinesis Firehose • Managed service to stream data to
storage services • Available destinations include S3, Redshift, Splunk • Max data delay to storage is 60 seconds 14

Amazon S3 • Managed blob storage service available via API
• High-durability and availability • Can trigger AWS Lambda on data change • Support data lifecycle management automation 15

AWS Lambda • Easy to write lightweight processing functions •
Triggered by events from other AWS components • Many supported runtimes (Node, Python, Java, ...) 16

Serverless data ingestion 17 Devices send streaming data (GPS, acceleration)
AWS IoT + Amazon Kinesis Amazon S3 AWS Lambda

Amazon Athena • Interactive query service for structured data on
S3 • SQL expression support • No data-preparation or ETL needed, just schema definition 18

Serverless solution beneﬁts • Unlimited and reliable storage, managed by
AWS • Easier to reason about smaller unit of computation to build pipeline of data analysis/transformation • Elastic data ingestion, data always delivered on time • No maintenance or upgrade costs 19

Serverless solution limits • Lambda functions have time execution, memory
and max concurrency limits • Tuning Kinesis for cost-eﬀectiveness can be tricky • Storing on S3 with no data lifecycle management can be expensive in the long run 20

21 Credits Forrest Brazeal

Thanks! Any questions? @flerro [email protected] 22 Presentation template by SlidesCarnival
is hiring Data Scientists and SW Engineers

A serverless data pipeline for Insurance Telema...

A serverless data pipeline for Insurance Telematics

Francesco Lerro

More Decks by Francesco Lerro

Other Decks in Technology

Featured

Transcript

Build a serverless data pipeline for Insurance Telematics HPC for

“ How hard is it to ingest hundreds of millions

On-premise solution 3 Devices send streaming data (GPS, acceleration) Kafka

4 Image credits: https://www.edureka.co/blog/hadoop-ecosystem

5

Hello! I am Francesco Lerro Delivering highly available solutions with

A factory which develops data-intensive solutions, applications or components, pursuing

Data Science Data mining Big Data Machine Learning Image processing

10.000.000 9 Vehicles with UnipolSai insurance 4.000.000 Black-box installed on

Limits of on-premise • Complex system with high operational costs

11

Serverless is a Paradigm Shift • Automated high availability •

Amazon IoT Core • Managed platform to handle IoT devices

Amazon Kinesis Firehose • Managed service to stream data to

Amazon S3 • Managed blob storage service available via API

AWS Lambda • Easy to write lightweight processing functions •

Serverless data ingestion 17 Devices send streaming data (GPS, acceleration)

Amazon Athena • Interactive query service for structured data on

Serverless solution beneﬁts • Unlimited and reliable storage, managed by

Serverless solution limits • Lambda functions have time execution, memory

21 Credits Forrest Brazeal

Thanks! Any questions? @flerro [email protected] 22 Presentation template by SlidesCarnival