Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[NYJavaSig] Riding The Distributed Streams
Search
Viktor Gamov
February 03, 2017
Technology
210
1
Share
[NYJavaSig] Riding The Distributed Streams
Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig
Viktor Gamov
February 03, 2017
More Decks by Viktor Gamov
See All by Viktor Gamov
Processing Streaming Data with KSQL
vikgamov
4
430
[VirtualJUG] Apache Kafka — A Streaming Data Platform
vikgamov
3
420
[SF JUG] Apache Kafka — A Streaming Data Platform
vikgamov
4
100
[OracleCode NYC-2018] Apache Kafka A Streaming Data Platform
vikgamov
1
180
[OracleCode NYC-2018] Rethinking Stream Processing with KStreams and KSQL
vikgamov
2
250
[JBreak-2018] Это кто там твитить про #jbreak?
vikgamov
0
240
[DevNexus-2018] Apache Kafka A Streaming Data Platform
vikgamov
2
320
[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
120
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
500
Other Decks in Technology
See All in Technology
数案件を同時に進行するためのコンテキスト整理術
sutetotanuki
2
240
インフラを Excel 管理していた組織が 3 ヶ月で IaC 化されるまで
geekplus_tech
3
190
生成AI時代のエンジニア育成 変わる時代と変わらないコト
starfish719
0
730
自分のハンドルは自分で握れ! ― 自分のケイパビリティを増やし、メンバーのケイパビリティ獲得を支援する ― / Take the wheel yourself
takaking22
0
140
AI前提とはどういうことか
daisuketakeda
0
190
サイバーフィジカル社会とは何か / What Is a Cyber-Physical Society?
ks91
PRO
0
180
シン・リスコフの置換原則 〜現代風に考えるSOLIDの原則〜
jinwatanabe
0
200
最初の一歩を踏み出せなかった私が、誰かの背中を押したいと思うようになるまで / give someone a push
mii3king
0
130
Azure Speech で音声対応してみよう
kosmosebi
0
100
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
74k
システムは「動く」だけでは 足りない - 非機能要件・分散システム・トレードオフの基礎
nwiizo
29
8.8k
CloudSec JP #005 後締め ~ソフトウェアサプライチェーン攻撃から開発者のシークレットを守る~
lhazy
0
180
Featured
See All Featured
Designing for humans not robots
tammielis
254
26k
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.6k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
790
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
73k
The Cost Of JavaScript in 2023
addyosmani
55
9.8k
Making the Leap to Tech Lead
cromwellryan
135
9.8k
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
180
Optimising Largest Contentful Paint
csswizardry
37
3.6k
The Invisible Side of Design
smashingmag
302
51k
Building AI with AI
inesmontani
PRO
1
880
My Coaching Mixtape
mlcsv
0
97
Transcript
None
> whoami • Solutions Architect @Hazelcast • Hang out with
awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©
Agenda • Refreshing knowledge on Java 8 Streams • Distribute
and Conquer • Distributed Data • Distributed Streams • How we did all this
Java 8 Streams
Java 8 Streams… • An abstraction represents a sequence of
elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source
Why I should care about Stream API? • You’re Java
developer
What does regular Java developer think about Scala? advanced
Why I should care about Stream API? • You’re Java
developer • Many Java developers know Java • It’s all about data processing
java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •
sorted()
None
None
None
Problem • One does not simply put all Big Data
in one machine
Problem • Data doesn’t fit just one machine
Problem • One does not simply put all Big Data
in one machine • Data is too important to have it only one machine
None
CACHES
Replication on Sharding? http://book.mixu.net/distsys/single-page.html
Solution • Use Distributed Map aka IMap
What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2
Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)
None
None
None
Green Primary Green Backup Green Shard
None
Problem • Lambda serialization 26
27
Solution • serializable version of the interfaces • Introducing DistributedStream
28
29
None
31 Jet Streams
None
What’s Hazelcast Jet? • General purpose distributed data processing framework
• Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33
None
DAG 35
Job Execution 36
None
Future (It’s bright!) • Memory module for processing big data
• Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)
Your fuel, our Jet Engine • Public release – Feb
7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note
[email protected]
• Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet
Conclusion • Java Stream API provides very white range of
data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters
None