Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!

© 2015 MapR Technologies ‹#› © 2016 MapR Technologies Tugdual
Grall Technical Evangelist @tgrall Lambda Architecture: The Best Way to Build Scalable and Reliable Applications! OOP-2016 Feb, 04, 2016

© 2016 MapR Technologies ‹#› @tgrall {“about” : “me”} Tugdual
“Tug” Grall • MapR • Technical Evangelist • MongoDB • Technical Evangelist • Couchbase • Technical Evangelist • eXo • CTO • Oracle • Developer/Product Manager • Mainly Java/SOA • Developer in consulting firms • Web • @tgrall • http://tgrall.github.io • tgrall  • NantesJUG co-founder  • Pet Project : • http://www.resultri.com • [email protected] • [email protected]

© 2016 MapR Technologies @tgrall 3 Big Data & Hadoop
In Production

© 2016 MapR Technologies 5 Data Hub Choose the best
“connector”: • File • Sqoop • ETL • … Use the aggregated data • In your applications • To update other systems • as an Open Data API • … Customer DB Customer DB Logs … Hadoop NoSQL

© 2016 MapR Technologies 6 Financial Services Fraud detection Personalized
offers Fraud investigation tool Fraud investigator Fraud model Recommendations table Clickstream analysis Online transactions MapR Distribution for Hadoop Analytics Real-time Operational Applications Interactive marketer

© 2016 MapR Technologies 8 Fault Tolerance hardware software developer
?

© 2016 MapR Technologies @tgrall 13 Lambda Architecture To the
rescue λ

© 2016 MapR Technologies 14 A little bit of history….
• Defined by Nathan Marz • ex BackType, Twitter • in a new Startup • Creator of … – Storm – Cascalog – ElephantDB

© 2016 MapR Technologies 15 Lambda Architecture Requirements • Fault-tolerant
against both hardware failures & human errors • Support variety of use cases that include low latency querying as well as updates • Linear scale-out capabilities • Extensible, so that the system is manageable and can accommodate newer features easily

© 2016 MapR Technologies 17 Lambda Architecture NEW DATA  
STREAM QUERY BATCH VIEWS √ View 1 View 2 View N REAL-TIME VIEWS BATCH LAYER SERVINGLAYER SPEED LAYER MERGE IMMUTABLE MASTER DATA PRECOMPUTE VIEWS BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS View 1 View 2 View N

© 2016 MapR Technologies 18 Data Ingestion All data entering
the system are dispatched to both • the batch layer • the speed layer NEW DATA   STREAM BATCH LAYER SPEED LAYER

© 2016 MapR Technologies Batch Layer • managing the master
dataset, an immutable, append-only set of raw data • pre-computing arbitrary query functions, called batch views. BATCH VIEWS BATCH LAYER IMMUTABLE MASTER DATA PRECOMPUTE VIEWS BATCH RECOMPUTE View 1 View 2 View N

© 2016 MapR Technologies 20 Speed Layer √ View 1
View 2 View N REAL-TIME VIEWS SPEED LAYER PROCESS STREAM INCREMENT VIEWS • Speed layer accommodates low latency requests that are subject to low latency requirements. • Using fast and incremental algorithms, deals with recent data only

© 2016 MapR Technologies 21 Serving Layer QUERY BATCH VIEWS
√ View 1 View 2 View N REAL-TIME VIEWS SERVINGLAYER MERGE View 1 View 2 View N • Serving layer indexes batch views so that they can be queried in ad hoc with low latency

© 2014 MapR Technologies 22 Lambda Architecture—Compensate Batch time not
absorbed now

© 2016 MapR Technologies 23 Lambda Architecture—Immutable Data + Views
http://openflights.org

timestamp airport flight action 2016-02-04T10:00:00 MUC EY123 take-off 2016-02-04T10:05:00 BRU SAS45 take-off 2016-02-04T10:07:00 AMS BA99 take-off 2016-02-04T10:09:00 LHR LH17 landing 2016-02-04T10:10:00 CDG AF03 landing 2016-02-04T10:10:00 FCO AZ501 take-off immutable master dataset

timestamp airport flight action 2016-02-04T10:00:00 MUC EY123 take-off 2016-02-04T10:05:00 BRU SAS45 take-off 2016-02-04T10:07:00 AMS BA99 take-off 2016-02-04T10:09:00 LHR LH17 landing 2016-02-04T10:10:00 CDG AF03 landing 2016-02-04T10:10:00 FCO AZ501 take-off air-borne: 2307 airline planes AF 59 AZ 23 BA 167 EY 19 LH 201 SAS 28 air-borne per airline: airport planes AMS 69 CDG 44 BRU 31 FCO 10 HEL 17 LHR 101 airport load:

STREAM QUERY BATCH VIEWS √ View 1 View 2 View N REAL-TIME VIEWS BATCH LAYER SERVINGLAYER SPEED LAYER MERGE IMMUTABLE MASTER DATA PRECOMPUTE VIEWS BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS View 1 View 2 View N

© 2016 MapR Technologies 28 Batch Layer: View Generation Master
Data View 1 View 2 Master Data Master Data Master Data Events “Raw” Storage Processing Aggregated Data

© 2016 MapR Technologies 30 • Cluster Computing Platform •
Extends “MapReduce” with extensions – Streaming – Interactive Analytics • Run in Memory

© 2015 MapR Technologies ‹#› @tgrall Spark components Spark SQL
Spark Streaming (Streaming) MLlib (Machine Learning) Spark Core (General execution engine) GraphX (Graph Computation) Mesos Distributed File System (HDFS, MapR-FS, S3, …) Hadoop YARN

© 2016 MapR Technologies 32 Spark Jobs Driver Program (application)
sc=new SparkContext rDD=sc.textfile(“hdfs:// …”) rDD.map Cluster Manager Worker Executor Task Task Worker Executor Task Task

© 2016 MapR Technologies 33 Spark Resilient Distributed Datasets “RDD”
Sensor RDD W Executor P4 W Executor P1 P3 W Executor P2 sc.textFile P1 8213034705, 95, 2.927373, jake7870, 0…… P2 8213034705, 115, 2.943484, Davidbresler2, 1…. P3 8213034705, 100, 2.951285, gladimacowgirl, 58… P4 8213034705, 117, 2.998947, daysrus, 95….

© 2016 MapR Technologies 34 Spark Resilient Distributed Datasets Transformation
Filter() Action Count() RDD newRDD Value

© 2015 MapR Technologies @tgrall Transformations • Process an RDD,
returns an RDD • Examples : • map() : one value => another value • mapToPair() : one value => a tuple • filter() : filters values/tuples on a given condition • groupByKey() : groups values by key • reduceByKey() : aggregates values by key • join(), cogroup(), … : joins RDDs

© 2015 MapR Technologies @tgrall Actions • Process an RDD,
returns a value • Examples : • count() : counts number of items in dataset • first() : returns first entry • take(n) : returns array of the n first elements • foreach() : applies a function on each element • collect() : returns all elements • saveAsTextFile() : saves in files each element

© 2016 MapR Technologies 38 Serving Layer: Aggregated Data •
Views are stored in a Read/Write database • Apache HBase • MapR DB Binary & JSON • Cassandra • MongoDB • Elasticsearch • …

© 2016 MapR Technologies 39 Serving Layer Real Time View
Events Processing Aggregated Batch View Query - SQL Dataviz Query/Visualisation SQL

© 2016 MapR Technologies 43 What is Spark Streaming? •
Enables scalable, high-throughput, fault-tolerant stream processing of live data • Extension of the core Spark Data Sources Data Sinks

© 2016 MapR Technologies 44 Spark Streaming Architecture • Divide
data stream into batches of X seconds (micro batching) • Called DStream = sequence of RDDs Spark Streaming input data stream DStream RDD batches Batch interval data from time 0 to 1 data from time 1 to 2 RDD @ time 2 data from time 2 to 3 RDD @ time 3 RDD @ time 1

© 2016 MapR Technologies 45 What are Apache Kafka &
MapR Streams? • Publish Subscribe Messaging • Fast • Scalable • Durable • Distributed

STREAM QUERY BATCH VIEWS √ View 1 View 2 View N REAL-TIME VIEWS BATCH LAYER SERVINGLAYER SPEED LAYER MERGE IMMUTABLE MASTER DATA PRECOMPUTE VIEWS BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS View 1 View 2 View N NoSQL Distributed File System NoSQL Streams

© 2016 MapR Technologies 48 Lambda Architecture in Action Batch
processing  (MapReduce) Tax reduction reporting Shortest path graph algorithm  (Titan on MapR-DB) Route optimization . . . Geolocation Geolocation Geolocation Geolocation Online alerts Real-time stream

© 2016 MapR Technologies 49 Lambda Architecture • Fault-tolerant •
Use batch layer to pre compute complex/large data set queries • Use speed layer to deal with “near real time” use cases • Linear scale-out capabilities • Error Prone: • Recompute data from master data set when needed

Lambda Architecture: The Best Way to Build Scal...

Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!

More Decks by Tugdual Grall

Other Decks in Technology

Featured

Transcript