Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Reactive Summit] Riding the Jet Streams

[Reactive Summit] Riding the Jet Streams

Java 8 introduced the Stream API as a modern, functional, and very powerful tool for processing collections of data. One of the main benefits of the Stream API is that it hides the details of iteration over the underlying data set, allowing for parallel processing within a single JVM, using a fork/join framework.
I will talk about a Stream API implementation that enables parallel processing across many machines and many JVMs.
You will learn how you can use the same API to process massive data sets across large clusters, which you already know how to do in a single JVM.
With an explanation of internals of the implementation, I will give an introduction to the general design behind stream processing using DAG (directed acyclic graph) engines and how an actor-based implementation can provide in-memory performance while still leveraging industry-wide known frameworks as Java Streams API.


Viktor Gamov

October 04, 2016

More Decks by Viktor Gamov

Other Decks in Technology


  1. Riding Jet Streams #reactivesummit #hazelcast #java8

  2. > whoami • Solutions Architect @Hazelcast • Hang out with

    Smart Guys • @gamussa on the intenetz Go follow me on Twitter I’m very interesting © 2
  3. Agenda • Brief intro to Java 8 Streams • Going

    Distributed • Data Distribution • Distributed Streams • Beyond Stream API: DAG-based compute engine 3
  4. 4 Java 8 Streams

  5. Java 8 Streams… • An abstraction represents a sequence of

    elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source 5
  6. Why I should care about Stream API? • You’re Java

    developer 6
  7. 7 What does regular Java developer think about Scala? advanced

  8. Why I should care about Stream API? • You’re Java

    developer • Many Java developers know Java • It’s all about data processing 8
  9. java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •

    sorted() 9
  10. None
  11. Problem • Data doesn’t fit just to one machine 11

  12. Problem • Data doesn’t fit just one machine 12

  13. Solution • Use distributed Map aka IMap 13

  14. None
  15. 15

  16. CACHES

  17. Replication v. Partitioning 17 http://book.mixu.net/distsys/single-page.html

  18. None
  19. What’s Hazelcast? • Open-source IMDG • Distributed • Caching (IMap,

    JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computing (ExecutorService, M-R) 19
  20. 20

  21. None
  22. None
  23. Green Primary Green Backup Green Shard

  24. Path to Microservices 24 Highly scalable In-Memory Data Grid Simple

    and configurable backbone one jar to «rule them all» Hazelcast is a…
  25. Problem • Lambda serialization 25

  26. 26

  27. Solution • serializable version of the interfaces • Introducing DistributedStream

  28. 28

  29. None
  30. 30 Jet Streams

  31. None
  32. What’s Hazelcast Jet? • General purpose distributed data processing framework

    • Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast • Comparable to Apache Spark or Apache Flink 32
  33. DAG 33

  34. Job Execution 34

  35. None
  36. 36

  37. 37

  38. Conclusions • The distributed processing frameworks have proprietary API •

    But still: many Java developers know Java • Streams bring expressiveness and conciseness of functional and declarative languages to good old Java 38
  39. Join the Community • Join Developer Preview program • http://bit.ly/gime_jet

    • Or shoot me an email viktor@hazelcast.com • Follow @hazelcast and @gamussa • Tweet your questions with #hazelcast #hazelcastjet 39
  40. 40 #reactivesummit #hazelcast #java8