Slide 1

Slide 1 text

Riding Jet Streams @gAmUssA @hazelcast #jfokus #hazelcastjet http://bit.ly/streams_jfokus2017

Slide 2

Slide 2 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Solutions Architect @Hazelcast Developer Advocate @Hazelcast @gamussa in internetz Please, follow me on Twitter I’m very interesting © > whoami

Slide 3

Slide 3 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Agenda Quick refresh on Java 8 Streams Distribute and Conquer Distributed Data Distributed Streams How we did all this

Slide 4

Slide 4 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Example: Word Count Map where keys are line numbers and values are lines. Find how many times each word occurs

Slide 5

Slide 5 text

@gAmUssA @hazelcast #jfokus #hazelcastjet What needs to be done? Iterate through all the lines Split the line into words Update running total of counts with new word

Slide 6

Slide 6 text

fillMapWithData("war_and_peace_eng.txt", source); for (String line : source.values()) { for (String word : PATTERN.split(line)) { if (word.length() >= 5) count.compute( cleanWord(word).toLowerCase(), (w, c) -> c == null ? 1 : c + 1 ); } } System.out.println(count.get("andrew")); Iterate through all the lines

Slide 7

Slide 7 text

fillMapWithData("war_and_peace_eng.txt", source); for (String line : source.values()) { for (String word : PATTERN.split(line)) { if (word.length() >= 5) count.compute( cleanWord(word).toLowerCase(), (w, c) -> c == null ? 1 : c + 1 ); } } System.out.println(count.get("andrew")); Split the line into words

Slide 8

Slide 8 text

fillMapWithData("war_and_peace_eng.txt", source); for (String line : source.values()) { for (String word : PATTERN.split(line)) { if (word.length() >= 5) count.compute( cleanWord(word).toLowerCase(), (w, c) -> c == null ? 1 : c + 1 ); } } System.out.println(count.get("andrew")); Update running total of counts with new word

Slide 9

Slide 9 text

fillMapWithData("war_and_peace_eng.txt", source); for (String line : source.values()) { for (String word : PATTERN.split(line)) { if (word.length() >= 5) count.compute( cleanWord(word).toLowerCase(), (w, c) -> c == null ? 1 : c + 1 ); } } System.out.println(count.get("andrew")); Print the result

Slide 10

Slide 10 text

java.util.stream

Slide 11

Slide 11 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Java 8 Streams… An abstraction represents a sequence of elements Is not a data structure Convey elements from a source through a pipeline of operations Operation doesn’t modify a source

Slide 12

Slide 12 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Why I should care about Stream API? You’re Java developer

Slide 13

Slide 13 text

What does regular Java developer think about Scala? advanced

Slide 14

Slide 14 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Why I should care about Stream API? You’re Java developer Many Java developers know Java It’s all about data processing

Slide 15

Slide 15 text

@gAmUssA @hazelcast #jfokus #hazelcastjet java.util.stream map(), flatMap(), filter() reduce(), collect() sorted(), distinct() Intermediate operation Terminal operation Stateful Intermediate (Blocking) operation

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Why would one need a cluster? One does not simply fit all Big Data in one machine

Slide 20

Slide 20 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Problem Data doesn’t fit just one machine

Slide 21

Slide 21 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Why would one need a cluster? One does not simply put all Big Data in one machine Data is too important to have it only one machine

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Replication on Sharding? http://book.mixu.net/distsys/single-page.html

Slide 25

Slide 25 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Another Requirements Easy to use Simple API Embeddable Cloud Native

Slide 26

Slide 26 text

@gAmUssA @hazelcast #jfokus #hazelcastjet What’s Hazelcast IMDG? In-memory Data Grid Apache v2 Licensed Distributed Caches (IMap, JCache) Java Collections (IList, ISet, IQueue) Messaging (Topic, RingBuffer) Computation (ExecutorService, M-R)

Slide 27

Slide 27 text

1 900 Stars On GitHub 100% Open Source 134 contributors

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Green Primary Green Backup Green Shard

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

@gAmUssA @hazelcast #jfokus #hazelcastjet What’s the problem? Use IMap.values().stream() ? Or IMap.entrySet().stream() ? 3 3

Slide 34

Slide 34 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Problem Data doesn’t fit just one machine

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

@gAmUssA @hazelcast #jfokus #hazelcastjet EASY (actually, not)! Implement serializable version of the interfaces Introducing DistributedStream 3 7

Slide 38

Slide 38 text

3 8

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

4 0 Jet Streams

Slide 41

Slide 41 text

jet.hazelcast.org

Slide 42

Slide 42 text

@gAmUssA @hazelcast #jfokus #hazelcastjet What’s Hazelcast Jet? General purpose distributed data processing framework Based on Direct Acyclic Graph to model data flow Built on top of Hazelcast IMDG Comparable to Apache Spark or Apache Flink 4 2

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

@gAmUssA @hazelcast #jfokus #hazelcastjet DAG vertex vertex vertex vertex SOURCE SINK

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

Benchmarks Compared to Spark, Flink, Hadoop doing word count, running on a cluster of 9 nodes, 40 cores each

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Future (It’s bright!) Processing guarantees for stream processing Streaming features (windowing, triggering) Higher level streaming and batching APIs Integration with additional Hazelcast structures (ICache, IQueue ..)

Slide 49

Slide 49 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Future (It’s bright!) Event sourcing / CQRS Off-heap memory support RxJava More connectors to additional sources (JMS, JDBC..)

Slide 50

Slide 50 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Grab while it’s hot! jet.hazelcast.org hazelcast/hazelcast-jet http://bit.ly/streams_jfokus2017 documentation Source on Github Presentation materials

Slide 51

Slide 51 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Conclusion Java Stream API provides very white range of data processing tools War And Piece – is a Big (a lot of data) Book! Now we’re pretty sure that Andrew and Pierre are the main characters

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

SlidesCarnival icons are editable shapes. This means that you can: ● Resize them without losing quality. ● Change fill color and opacity. Isn’t that nice? :) Examples:

Slide 54

Slide 54 text

Now you can use any emoji as an icon! And of course it resizes without losing quality and you can change the color. How? Follow Google instructions https://twitter.com/googledocs/status/730087240156643328 ✋❤ and many more...

Slide 55

Slide 55 text

@gAmUssA @hazelcast #jfokus #hazelcastjet Extra graphics