Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Chaudhri at Big Data Spain 2017

Akmal Chaudhri, GridGain Systems Boost Hadoop and Spark with in-memory
technologies

Agenda • Introduction to Apache Ignite • Hadoop Acceleration •
Spark Acceleration • Demos • Q&A Big Data Spain 2017

Apache Ignite in one slide • Memory-centric platform – that
is strongly consistent – and highly-available – with powerful SQL – key-value and processing APIs • Designed for – Performance – Scalability Big Data Spain 2017

Apache Ignite • Data source agnostic • Fully fledged compute
engine and durable storage • OLAP and OLTP • Fully ACID transactions across memory and disk • In-memory SQL support • Early ML libraries • Growing community Big Data Spain 2017

Hadoop Acceleration • In-memory Hadoop Execution • Alternative job tracker
– Faster MapReduce • Built on Ignite File System (IGFS) • Secondary File System – Read-through and Write-through Big Data Spain 2017

Ignite In-Memory File System • Distributed in-memory file system •
Implements HDFS API • Can be transparently plugged into Hadoop or Spark deployments Big Data Spain 2017

MapReduce Big Data Spain 2017

MapReduce • Parallelize processing of data in HDFS • Eliminate
Hadoop JobTracker and TaskTracker overhead • Low-Latency distributed processing • Minimal configuration change Big Data Spain 2017

Spark Acceleration • Long running applications – Passing state between
jobs • Disk File System – Convert RDDs to disk files and back • Share RDDs in-memory – Native Spark API – Native Spark transformations Big Data Spain 2017

Ignite for Spark • Spark RDD abstraction • Shared in-memory
view on data across different Spark jobs, workers or applications • Implemented as a view over a distributed Ignite cache Big Data Spain 2017

IgniteContext • Main entry-point to Spark-Ignite integration • SparkContext plus
either one of – IgniteConfiguration() – Path to XML configuration file • Optional Boolean client argument – true => Shared deployment – false => Embedded deployment Big Data Spain 2017

IgniteContext examples Big Data Spain 2017 val i gni t
eCont ext = new I gni t eCont ext ( sparkCont ext , ( ) = > new I gni t eConf i gurat i on( ) ) val i gni t eCont ext = new I gni t eCont ext( sparkCont ext , "exam pl es/ conf i g/ spark/ exam pl e- shar ed- r dd. xm l ")

IgniteRDD • Implementation of Spark RDD representing a live view
of an Ignite cache • Mutable (unlike native RDDs) – All changes in Ignite cache will be visible to RDD users immediately • Provides partitioning information to Spark executor • Provides affinity information to Spark so that RDD computations can use data locality Big Data Spain 2017

Write to Ignite • Ignite caches operate on key-value pairs
• Spark tuple RDD for key-value pairs and savePairs method – RDD partitioning, store values in parallel if possible • Value-only RDD and saveValues method – IgniteRDD generates a unique affinity-local key for each value stored into the cache Big Data Spain 2017

Write code example Big Data Spain 2017 val conf =
new SparkConf ( ) . set AppNam e( "SparkI gni t eW ri t er") val sc = new SparkCont ext ( conf ) val i c = new I gni t eCont ext ( sc, "exam pl es/ conf i g/ spark/ exam pl e- shar ed- r dd. xm l ") val shar edRD D : I gni t eRD D [ I nt , I nt ] = i c. f r om Cache( "shar edRD D ") shar edRD D . savePai rs( sc. paral l el i ze( 1 t o 100000, 10) . m ap( i = > ( i , i ) ) )

Read from Ignite • IgniteRDD is a live view of
an Ignite cache – No need to explicitly load data to Spark application from Ignite – All RDD methods are available to use right away after an instance of IgniteRDD is created Big Data Spain 2017

Read code example Big Data Spain 2017 val conf =
new SparkConf ( ) . set AppNam e( "SparkI gni t eReader") val sc = new SparkCont ext ( conf ) val i c = new I gni t eCont ext ( sc, "exam pl es/ conf i g/ spark/ exam pl e- shar ed- r dd. xm l ") val shar edRD D : I gni t eRD D [ I nt , I nt ] = i c. f r om Cache( "shar edRD D ") val gr eat erThanFi f t yThousand = shar edRD D . f i l t er( _. _2 > 50000) pri nt l n( "The count i s "+ gr eat erThanFi f t yThousand. count ( ) )

Demos Big Data Spain 2017

Any Questions? Thank you for joining us. Follow the conversation.
http://ignite.apache.org Big Data Spain 2017

Presentation: Boost Hadoop and Spark with in-me...

Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Chaudhri at Big Data Spain 2017

Big Data Spain

More Decks by Big Data Spain

Other Decks in Technology

Featured

Transcript

Akmal Chaudhri, GridGain Systems Boost Hadoop and Spark with in-memory

Agenda • Introduction to Apache Ignite • Hadoop Acceleration •

Apache Ignite in one slide • Memory-centric platform – that

Apache Ignite • Data source agnostic • Fully fledged compute

Hadoop Acceleration • In-memory Hadoop Execution • Alternative job tracker

Ignite In-Memory File System • Distributed in-memory file system •

MapReduce Big Data Spain 2017

MapReduce • Parallelize processing of data in HDFS • Eliminate

Spark Acceleration • Long running applications – Passing state between

Ignite for Spark • Spark RDD abstraction • Shared in-memory

IgniteContext • Main entry-point to Spark-Ignite integration • SparkContext plus

IgniteContext examples Big Data Spain 2017 val i gni t

IgniteRDD • Implementation of Spark RDD representing a live view

Write to Ignite • Ignite caches operate on key-value pairs

Write code example Big Data Spain 2017 val conf =

Read from Ignite • IgniteRDD is a live view of

Read code example Big Data Spain 2017 val conf =

Demos Big Data Spain 2017

Any Questions? Thank you for joining us. Follow the conversation.