Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

> whoami • Solutions Architect @Hazelcast • Hang out with awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©

Slide 3

Slide 3 text

Agenda • Refreshing knowledge on Java 8 Streams • Distribute and Conquer • Distributed Data • Distributed Streams • How we did all this

Slide 4

Slide 4 text

Java 8 Streams

Slide 5

Slide 5 text

Java 8 Streams… • An abstraction represents a sequence of elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source

Slide 6

Slide 6 text

Why I should care about Stream API? • You’re Java developer

Slide 7

Slide 7 text

What does regular Java developer think about Scala? advanced

Slide 8

Slide 8 text

Why I should care about Stream API? • You’re Java developer • Many Java developers know Java • It’s all about data processing

Slide 9

Slide 9 text

java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() • sorted()

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Problem • One does not simply put all Big Data in one machine

Slide 14

Slide 14 text

Problem • Data doesn’t fit just one machine

Slide 15

Slide 15 text

Problem • One does not simply put all Big Data in one machine • Data is too important to have it only one machine

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

CACHES

Slide 18

Slide 18 text

Replication on Sharding? http://book.mixu.net/distsys/single-page.html

Slide 19

Slide 19 text

Solution • Use Distributed Map aka IMap

Slide 20

Slide 20 text

What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2 Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Green Primary Green Backup Green Shard

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Problem • Lambda serialization 26

Slide 27

Slide 27 text

27

Slide 28

Slide 28 text

Solution • serializable version of the interfaces • Introducing DistributedStream 28

Slide 29

Slide 29 text

29

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

31 Jet Streams

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

What’s Hazelcast Jet? • General purpose distributed data processing framework • Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

DAG 35

Slide 36

Slide 36 text

Job Execution 36

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Future (It’s bright!) • Memory module for processing big data • Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)

Slide 39

Slide 39 text

Your fuel, our Jet Engine • Public release – Feb 7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note [email protected] • Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet

Slide 40

Slide 40 text

Conclusion • Java Stream API provides very white range of data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters

Slide 41

Slide 41 text

No content