Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[NYJavaSig] Riding The Distributed Streams
Search
Viktor Gamov
February 03, 2017
Technology
1
200
[NYJavaSig] Riding The Distributed Streams
Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig
Viktor Gamov
February 03, 2017
Tweet
Share
More Decks by Viktor Gamov
See All by Viktor Gamov
Processing Streaming Data with KSQL
vikgamov
4
390
[VirtualJUG] Apache Kafka — A Streaming Data Platform
vikgamov
3
380
[SF JUG] Apache Kafka — A Streaming Data Platform
vikgamov
4
87
[OracleCode NYC-2018] Apache Kafka A Streaming Data Platform
vikgamov
1
170
[OracleCode NYC-2018] Rethinking Stream Processing with KStreams and KSQL
vikgamov
2
230
[JBreak-2018] Это кто там твитить про #jbreak?
vikgamov
0
220
[DevNexus-2018] Apache Kafka A Streaming Data Platform
vikgamov
2
290
[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
110
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
480
Other Decks in Technology
See All in Technology
グローバルなコンパウンド戦略を支えるモジュラーモノリスとドメイン駆動設計
kawauso
2
2.7k
単一Kubernetesクラスタで実現する AI/ML 向けクラウドサービス
pfn
PRO
1
300
AIと自動化がもたらす業務効率化の実例: 反社チェック等の調査・業務プロセス自動化
enpipi
0
680
Bedrock のコスト監視設計
fohte
2
180
ソフトウェア開発現代史: 55%が変化に備えていない現実 ─ AI支援型開発時代のReboot Japan #agilejapan
takabow
7
4.4k
SRE視点で振り返るメルカリのアーキテクチャ変遷と普遍的な考え
foostan
1
210
機密情報の漏洩を防げ! Webフロントエンド開発で意識すべき漏洩パターンとその対策
mizdra
PRO
10
3.6k
レビュー負債を解消する ― CodeRabbitが支えるAI駆動開発
moongift
PRO
0
430
大規模プロダクトで実践するAI活用の仕組みづくり
k1tikurisu
4
1.6k
ABEMAのCM配信を支えるスケーラブルな分散カウンタの実装
hono0130
4
990
レガシーで硬直したテーブル設計から変更容易で柔軟なテーブル設計にする
red_frasco
2
210
旧から新へ: 大規模ウェブクローラの Perl から Go への移行 / YAPC::Fukuoka 2025
motemen
3
1.1k
Featured
See All Featured
Fireside Chat
paigeccino
41
3.7k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
A designer walks into a library…
pauljervisheath
210
24k
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
Embracing the Ebb and Flow
colly
88
4.9k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
2.9k
Become a Pro
speakerdeck
PRO
29
5.6k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.7k
Into the Great Unknown - MozCon
thekraken
40
2.2k
How STYLIGHT went responsive
nonsquared
100
5.9k
Scaling GitHub
holman
463
140k
Transcript
None
> whoami • Solutions Architect @Hazelcast • Hang out with
awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©
Agenda • Refreshing knowledge on Java 8 Streams • Distribute
and Conquer • Distributed Data • Distributed Streams • How we did all this
Java 8 Streams
Java 8 Streams… • An abstraction represents a sequence of
elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source
Why I should care about Stream API? • You’re Java
developer
What does regular Java developer think about Scala? advanced
Why I should care about Stream API? • You’re Java
developer • Many Java developers know Java • It’s all about data processing
java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •
sorted()
None
None
None
Problem • One does not simply put all Big Data
in one machine
Problem • Data doesn’t fit just one machine
Problem • One does not simply put all Big Data
in one machine • Data is too important to have it only one machine
None
CACHES
Replication on Sharding? http://book.mixu.net/distsys/single-page.html
Solution • Use Distributed Map aka IMap
What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2
Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)
None
None
None
Green Primary Green Backup Green Shard
None
Problem • Lambda serialization 26
27
Solution • serializable version of the interfaces • Introducing DistributedStream
28
29
None
31 Jet Streams
None
What’s Hazelcast Jet? • General purpose distributed data processing framework
• Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33
None
DAG 35
Job Execution 36
None
Future (It’s bright!) • Memory module for processing big data
• Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)
Your fuel, our Jet Engine • Public release – Feb
7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note
[email protected]
• Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet
Conclusion • Java Stream API provides very white range of
data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters
None