Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[NYJavaSig] Riding The Distributed Streams
Search
Viktor Gamov
February 03, 2017
Technology
1
170
[NYJavaSig] Riding The Distributed Streams
Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig
Viktor Gamov
February 03, 2017
Tweet
Share
More Decks by Viktor Gamov
See All by Viktor Gamov
Processing Streaming Data with KSQL
vikgamov
4
300
[VirtualJUG] Apache Kafka — A Streaming Data Platform
vikgamov
3
290
[SF JUG] Apache Kafka — A Streaming Data Platform
vikgamov
4
62
[OracleCode NYC-2018] Apache Kafka A Streaming Data Platform
vikgamov
1
150
[OracleCode NYC-2018] Rethinking Stream Processing with KStreams and KSQL
vikgamov
2
200
[JBreak-2018] Это кто там твитить про #jbreak?
vikgamov
0
160
[DevNexus-2018] Apache Kafka A Streaming Data Platform
vikgamov
2
220
[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
92
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
420
Other Decks in Technology
See All in Technology
LLM とプロンプトエンジニアリング/チューターをビルドする / LLM and Prompt Engineering and Building Tutors
ks91
PRO
0
220
Terraformあれやこれ/terraform-this-and-that
emiki
7
600
"好き"との生活/Regularly update profile with GitHub Actions
judeeeee
0
150
ChatGPT for IT Service Management (IT Pro)
dahatake
3
240
少数チームで挑む: SwiftUI, TCA, KMPを用いた 新規動画配信アプリ 「ABEMA Live」の開発について
tomu28
0
550
「手動オペレーションに定評がある」と言われた私が心がけていること / phpcon_odawara2024
blue_goheimochi
2
320
DevOpsDays History and my DevOps story
kawaguti
PRO
8
1.6k
マルチアカウント環境への発見的統制の導入
ch1aki
1
1.3k
Tableau事例紹介 / Tableau Case Study of Eureka
kazuya_araki_tokyo
1
170
レガシーをぶっ壊せ。AEONで始めるDevRelの話 / Qiita Night 2024-2-22
aeonpeople
3
150
PHPカンファレンス小田原2024
ysknsid25
3
660
エンタープライズ環境下での Active Directory の運用 TIPS
tamaiyutaro
1
1.6k
Featured
See All Featured
Automating Front-end Workflow
addyosmani
1355
200k
Mobile First: as difficult as doing things right
swwweet
216
8.6k
RailsConf 2023
tenderlove
2
530
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
20
1.6k
Gamification - CAS2011
davidbonilla
76
4.6k
The Straight Up "How To Draw Better" Workshop
denniskardys
227
130k
Bootstrapping a Software Product
garrettdimon
PRO
301
110k
How To Stay Up To Date on Web Technology
chriscoyier
782
250k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
15
1.4k
Reflections from 52 weeks, 52 projects
jeffersonlam
344
19k
BBQ
matthewcrist
80
8.7k
How to train your dragon (web standard)
notwaldorf
72
5.1k
Transcript
None
> whoami • Solutions Architect @Hazelcast • Hang out with
awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©
Agenda • Refreshing knowledge on Java 8 Streams • Distribute
and Conquer • Distributed Data • Distributed Streams • How we did all this
Java 8 Streams
Java 8 Streams… • An abstraction represents a sequence of
elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source
Why I should care about Stream API? • You’re Java
developer
What does regular Java developer think about Scala? advanced
Why I should care about Stream API? • You’re Java
developer • Many Java developers know Java • It’s all about data processing
java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •
sorted()
None
None
None
Problem • One does not simply put all Big Data
in one machine
Problem • Data doesn’t fit just one machine
Problem • One does not simply put all Big Data
in one machine • Data is too important to have it only one machine
None
CACHES
Replication on Sharding? http://book.mixu.net/distsys/single-page.html
Solution • Use Distributed Map aka IMap
What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2
Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)
None
None
None
Green Primary Green Backup Green Shard
None
Problem • Lambda serialization 26
27
Solution • serializable version of the interfaces • Introducing DistributedStream
28
29
None
31 Jet Streams
None
What’s Hazelcast Jet? • General purpose distributed data processing framework
• Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33
None
DAG 35
Job Execution 36
None
Future (It’s bright!) • Memory module for processing big data
• Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)
Your fuel, our Jet Engine • Public release – Feb
7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note
[email protected]
• Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet
Conclusion • Java Stream API provides very white range of
data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters
None