Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Reactive Summit] Riding the Jet Streams

[Reactive Summit] Riding the Jet Streams

Java 8 introduced the Stream API as a modern, functional, and very powerful tool for processing collections of data. One of the main benefits of the Stream API is that it hides the details of iteration over the underlying data set, allowing for parallel processing within a single JVM, using a fork/join framework.
I will talk about a Stream API implementation that enables parallel processing across many machines and many JVMs.
You will learn how you can use the same API to process massive data sets across large clusters, which you already know how to do in a single JVM.
With an explanation of internals of the implementation, I will give an introduction to the general design behind stream processing using DAG (directed acyclic graph) engines and how an actor-based implementation can provide in-memory performance while still leveraging industry-wide known frameworks as Java Streams API.

Viktor Gamov

October 04, 2016

More Decks by Viktor Gamov

Other Decks in Technology


  1. > whoami • Solutions Architect @Hazelcast • Hang out with

    Smart Guys • @gamussa on the intenetz Go follow me on Twitter I’m very interesting © 2
  2. Agenda • Brief intro to Java 8 Streams • Going

    Distributed • Data Distribution • Distributed Streams • Beyond Stream API: DAG-based compute engine 3
  3. Java 8 Streams… • An abstraction represents a sequence of

    elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source 5
  4. Why I should care about Stream API? • You’re Java

    developer • Many Java developers know Java • It’s all about data processing 8
  5. 15

  6. What’s Hazelcast? • Open-source IMDG • Distributed • Caching (IMap,

    JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computing (ExecutorService, M-R) 19
  7. 20

  8. Path to Microservices 24 Highly scalable In-Memory Data Grid Simple

    and configurable backbone one jar to «rule them all» Hazelcast is a…
  9. 26

  10. 28

  11. What’s Hazelcast Jet? • General purpose distributed data processing framework

    • Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast • Comparable to Apache Spark or Apache Flink 32
  12. 36

  13. 37

  14. Conclusions • The distributed processing frameworks have proprietary API •

    But still: many Java developers know Java • Streams bring expressiveness and conciseness of functional and declarative languages to good old Java 38
  15. Join the Community • Join Developer Preview program • http://bit.ly/gime_jet

    • Or shoot me an email [email protected] • Follow @hazelcast and @gamussa • Tweet your questions with #hazelcast #hazelcastjet 39