Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[NYJavaSig] Riding The Distributed Streams
Search
Viktor Gamov
February 03, 2017
Technology
1
200
[NYJavaSig] Riding The Distributed Streams
Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig
Viktor Gamov
February 03, 2017
Tweet
Share
More Decks by Viktor Gamov
See All by Viktor Gamov
Processing Streaming Data with KSQL
vikgamov
4
380
[VirtualJUG] Apache Kafka — A Streaming Data Platform
vikgamov
3
380
[SF JUG] Apache Kafka — A Streaming Data Platform
vikgamov
4
82
[OracleCode NYC-2018] Apache Kafka A Streaming Data Platform
vikgamov
1
170
[OracleCode NYC-2018] Rethinking Stream Processing with KStreams and KSQL
vikgamov
2
230
[JBreak-2018] Это кто там твитить про #jbreak?
vikgamov
0
210
[DevNexus-2018] Apache Kafka A Streaming Data Platform
vikgamov
2
280
[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
110
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
470
Other Decks in Technology
See All in Technology
ブロックテーマ時代における、テーマの CSS について考える Toro_Unit / 2025.09.13 @ Shinshu WordPress Meetup
torounit
0
130
Firestore → Spanner 移行 を成功させた段階的移行プロセス
athug
1
490
RSCの時代にReactとフレームワークの境界を探る
uhyo
10
3.5k
Codeful Serverless / 一人運用でもやり抜く力
_kensh
7
450
機械学習を扱うプラットフォーム開発と運用事例
lycorptech_jp
PRO
0
560
いま注目のAIエージェントを作ってみよう
supermarimobros
0
340
普通のチームがスクラムを会得するたった一つの冴えたやり方 / the best way to scrum
okamototakuyasr2
0
100
TS-S205_昨年対比2倍以上の機能追加を実現するデータ基盤プロジェクトでのAI活用について
kaz3284
1
210
現場で効くClaude Code ─ 最新動向と企業導入
takaakikakei
1
260
La gouvernance territoriale des données grâce à la plateforme Terreze
bluehats
0
190
250905 大吉祥寺.pm 2025 前夜祭 「プログラミングに出会って20年、『今』が1番楽しい」
msykd
PRO
1
980
5年目から始める Vue3 サイト改善 #frontendo
tacck
PRO
3
230
Featured
See All Featured
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.7k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Side Projects
sachag
455
43k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.9k
Making Projects Easy
brettharned
117
6.4k
Embracing the Ebb and Flow
colly
87
4.8k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Typedesign – Prime Four
hannesfritz
42
2.8k
Code Review Best Practice
trishagee
71
19k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.5k
Transcript
None
> whoami • Solutions Architect @Hazelcast • Hang out with
awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©
Agenda • Refreshing knowledge on Java 8 Streams • Distribute
and Conquer • Distributed Data • Distributed Streams • How we did all this
Java 8 Streams
Java 8 Streams… • An abstraction represents a sequence of
elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source
Why I should care about Stream API? • You’re Java
developer
What does regular Java developer think about Scala? advanced
Why I should care about Stream API? • You’re Java
developer • Many Java developers know Java • It’s all about data processing
java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •
sorted()
None
None
None
Problem • One does not simply put all Big Data
in one machine
Problem • Data doesn’t fit just one machine
Problem • One does not simply put all Big Data
in one machine • Data is too important to have it only one machine
None
CACHES
Replication on Sharding? http://book.mixu.net/distsys/single-page.html
Solution • Use Distributed Map aka IMap
What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2
Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)
None
None
None
Green Primary Green Backup Green Shard
None
Problem • Lambda serialization 26
27
Solution • serializable version of the interfaces • Introducing DistributedStream
28
29
None
31 Jet Streams
None
What’s Hazelcast Jet? • General purpose distributed data processing framework
• Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33
None
DAG 35
Job Execution 36
None
Future (It’s bright!) • Memory module for processing big data
• Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)
Your fuel, our Jet Engine • Public release – Feb
7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note
[email protected]
• Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet
Conclusion • Java Stream API provides very white range of
data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters
None