Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[NYJavaSig] Riding The Distributed Streams
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Viktor Gamov
February 03, 2017
Technology
210
1
Share
[NYJavaSig] Riding The Distributed Streams
Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig
Viktor Gamov
February 03, 2017
More Decks by Viktor Gamov
See All by Viktor Gamov
Processing Streaming Data with KSQL
vikgamov
4
440
[VirtualJUG] Apache Kafka — A Streaming Data Platform
vikgamov
3
420
[SF JUG] Apache Kafka — A Streaming Data Platform
vikgamov
4
100
[OracleCode NYC-2018] Apache Kafka A Streaming Data Platform
vikgamov
1
180
[OracleCode NYC-2018] Rethinking Stream Processing with KStreams and KSQL
vikgamov
2
260
[JBreak-2018] Это кто там твитить про #jbreak?
vikgamov
0
240
[DevNexus-2018] Apache Kafka A Streaming Data Platform
vikgamov
2
320
[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
120
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
500
Other Decks in Technology
See All in Technology
Scovilleモバイルエンジニア募集中.pdf
julienrudin
0
150
「SaaSの次の時代」に重要性を増すステークホルダーマネジメントの要諦 ~解像度を圧倒的に高めPdMの価値を最大化させる方法~
kakehashi
PRO
3
3.6k
知ってた?JavaScriptの"正しさ"を検証するテストが5万以上もあること(Test262)
riyaamemiya
0
130
はじめての MagicPod生成AI機能 機能紹介から活用方法まで
magicpod
0
130
国内外の生成AIセキュリティの最新動向 & AIガードレール製品「chakoshi」のご紹介 / Latest Trends in Generative AI Security (Domestic & International) & Introduction to AI Guardrail Product "chakoshi"
nttcom
4
1.7k
自動テストだけで リリース判断できるチームへ - 鍵はテストの量ではなくリリース判断基準の再設計にあった / Redesigning Release Criteria for Lightweight Releases
ewa
6
3.1k
小説執筆のハーネスエンジニアリング
yoshitetsu
0
910
AIが盛んな時代に 技術記事を書き始めて起きた私の中での小さな変化
peintangos
0
340
Microsoft 365 / Microsoft 365 Copilot : 自分の状態を確認する「ラベル」について
taichinakamura
0
440
AWS Agent Registry の基礎・概要を理解する/aws-agent-registry-intro
ren8k
3
440
AWS Transform CustomでIaCコードを自由自在に変換しよう
duelist2020jp
0
230
多角的な視点から見たAGI
terisuke
0
120
Featured
See All Featured
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
62k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
500
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
180
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
190
Context Engineering - Making Every Token Count
addyosmani
9
860
Bash Introduction
62gerente
615
210k
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3.1k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
1k
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
110k
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
180
How to make the Groovebox
asonas
2
2.1k
GraphQLとの向き合い方2022年版
quramy
50
15k
Transcript
None
> whoami • Solutions Architect @Hazelcast • Hang out with
awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©
Agenda • Refreshing knowledge on Java 8 Streams • Distribute
and Conquer • Distributed Data • Distributed Streams • How we did all this
Java 8 Streams
Java 8 Streams… • An abstraction represents a sequence of
elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source
Why I should care about Stream API? • You’re Java
developer
What does regular Java developer think about Scala? advanced
Why I should care about Stream API? • You’re Java
developer • Many Java developers know Java • It’s all about data processing
java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •
sorted()
None
None
None
Problem • One does not simply put all Big Data
in one machine
Problem • Data doesn’t fit just one machine
Problem • One does not simply put all Big Data
in one machine • Data is too important to have it only one machine
None
CACHES
Replication on Sharding? http://book.mixu.net/distsys/single-page.html
Solution • Use Distributed Map aka IMap
What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2
Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)
None
None
None
Green Primary Green Backup Green Shard
None
Problem • Lambda serialization 26
27
Solution • serializable version of the interfaces • Introducing DistributedStream
28
29
None
31 Jet Streams
None
What’s Hazelcast Jet? • General purpose distributed data processing framework
• Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33
None
DAG 35
Job Execution 36
None
Future (It’s bright!) • Memory module for processing big data
• Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)
Your fuel, our Jet Engine • Public release – Feb
7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note
[email protected]
• Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet
Conclusion • Java Stream API provides very white range of
data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters
None