Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[NYJavaSig] Riding The Distributed Streams
Search
Viktor Gamov
February 03, 2017
Technology
1
190
[NYJavaSig] Riding The Distributed Streams
Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig
Viktor Gamov
February 03, 2017
Tweet
Share
More Decks by Viktor Gamov
See All by Viktor Gamov
Processing Streaming Data with KSQL
vikgamov
4
330
[VirtualJUG] Apache Kafka — A Streaming Data Platform
vikgamov
3
330
[SF JUG] Apache Kafka — A Streaming Data Platform
vikgamov
4
72
[OracleCode NYC-2018] Apache Kafka A Streaming Data Platform
vikgamov
1
150
[OracleCode NYC-2018] Rethinking Stream Processing with KStreams and KSQL
vikgamov
2
210
[JBreak-2018] Это кто там твитить про #jbreak?
vikgamov
0
170
[DevNexus-2018] Apache Kafka A Streaming Data Platform
vikgamov
2
240
[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
94
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
440
Other Decks in Technology
See All in Technology
신뢰할 수 있는 AI 검색 엔진을 만들기 위한 Liner의 여정
huffon
0
350
pandasはPolarsに性能面で追いつき追い越せるのか
vaaaaanquish
4
4.6k
バクラクにおける可観測性向上の取り組み
yuu26
3
420
10分でわかるfreee エンジニア向け会社説明資料
freee
18
520k
使えそうで使われないCloudHSM
maikamibayashi
0
170
ガバメントクラウド単独利用方式におけるIaC活用
techniczna
3
270
ユーザーの購買行動モデリングとその分析 / dsc-purchase-analysis
cyberagentdevelopers
PRO
2
100
スプリントゴールにチームの状態も設定する背景とその効果 / Team state in sprint goals why and impact
kakehashi
2
100
Vueで Webコンポーネントを作って Reactで使う / 20241030-cloudsign-vuefes_after_night
bengo4com
4
2.5k
小規模に始めるデータメッシュとデータガバナンスの実践
kimujun
3
590
10分でわかるfreeeのQA
freee
1
3.4k
現地でMeet Upをやる場合の注意点〜反省点を添えて〜
shotashiratori
0
530
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
459
33k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
664
120k
Learning to Love Humans: Emotional Interface Design
aarron
272
40k
A better future with KSS
kneath
238
17k
Rails Girls Zürich Keynote
gr2m
93
13k
Designing Experiences People Love
moore
138
23k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
The Invisible Side of Design
smashingmag
297
50k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
126
18k
We Have a Design System, Now What?
morganepeng
50
7.2k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Adopting Sorbet at Scale
ufuk
73
9k
Transcript
None
> whoami • Solutions Architect @Hazelcast • Hang out with
awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©
Agenda • Refreshing knowledge on Java 8 Streams • Distribute
and Conquer • Distributed Data • Distributed Streams • How we did all this
Java 8 Streams
Java 8 Streams… • An abstraction represents a sequence of
elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source
Why I should care about Stream API? • You’re Java
developer
What does regular Java developer think about Scala? advanced
Why I should care about Stream API? • You’re Java
developer • Many Java developers know Java • It’s all about data processing
java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •
sorted()
None
None
None
Problem • One does not simply put all Big Data
in one machine
Problem • Data doesn’t fit just one machine
Problem • One does not simply put all Big Data
in one machine • Data is too important to have it only one machine
None
CACHES
Replication on Sharding? http://book.mixu.net/distsys/single-page.html
Solution • Use Distributed Map aka IMap
What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2
Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)
None
None
None
Green Primary Green Backup Green Shard
None
Problem • Lambda serialization 26
27
Solution • serializable version of the interfaces • Introducing DistributedStream
28
29
None
31 Jet Streams
None
What’s Hazelcast Jet? • General purpose distributed data processing framework
• Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33
None
DAG 35
Job Execution 36
None
Future (It’s bright!) • Memory module for processing big data
• Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)
Your fuel, our Jet Engine • Public release – Feb
7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note
[email protected]
• Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet
Conclusion • Java Stream API provides very white range of
data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters
None