$30 off During Our Annual Pro Sale. View Details »

[Reactive Summit] Riding the Jet Streams

[Reactive Summit] Riding the Jet Streams

Java 8 introduced the Stream API as a modern, functional, and very powerful tool for processing collections of data. One of the main benefits of the Stream API is that it hides the details of iteration over the underlying data set, allowing for parallel processing within a single JVM, using a fork/join framework.
I will talk about a Stream API implementation that enables parallel processing across many machines and many JVMs.
You will learn how you can use the same API to process massive data sets across large clusters, which you already know how to do in a single JVM.
With an explanation of internals of the implementation, I will give an introduction to the general design behind stream processing using DAG (directed acyclic graph) engines and how an actor-based implementation can provide in-memory performance while still leveraging industry-wide known frameworks as Java Streams API.

Viktor Gamov

October 04, 2016
Tweet

More Decks by Viktor Gamov

Other Decks in Technology

Transcript

  1. Riding Jet Streams
    #reactivesummit
    #hazelcast
    #java8

    View Slide

  2. > whoami
    • Solutions Architect @Hazelcast
    • Hang out with Smart Guys
    • @gamussa on the intenetz
    Go follow me on Twitter
    I’m very interesting ©
    2

    View Slide

  3. Agenda
    • Brief intro to Java 8 Streams
    • Going Distributed
    • Data Distribution
    • Distributed Streams
    • Beyond Stream API: DAG-based compute engine
    3

    View Slide

  4. 4
    Java 8 Streams

    View Slide

  5. Java 8 Streams…
    • An abstraction represents a sequence of
    elements
    • Is not a data structure
    • Convey elements from a source through a pipeline
    of operations
    • Operation doesn’t modify a source
    5

    View Slide

  6. Why I should care about
    Stream API?
    • You’re Java developer
    6

    View Slide

  7. 7
    What does regular Java developer think about Scala?
    advanced

    View Slide

  8. Why I should care about
    Stream API?
    • You’re Java developer
    • Many Java developers know Java
    • It’s all about data processing
    8

    View Slide

  9. java.util.stream
    operations
    • map(), flatMap(), filter()
    • reduce(), collect()
    • sorted()
    9

    View Slide

  10. View Slide

  11. Problem
    • Data doesn’t fit just to one machine
    11

    View Slide

  12. Problem
    • Data doesn’t fit just one machine
    12

    View Slide

  13. Solution
    • Use distributed Map aka IMap
    13

    View Slide

  14. View Slide

  15. 15

    View Slide

  16. CACHES

    View Slide

  17. Replication v. Partitioning
    17
    http://book.mixu.net/distsys/single-page.html

    View Slide

  18. View Slide

  19. What’s Hazelcast?
    • Open-source IMDG
    • Distributed
    • Caching (IMap, JCache)
    • Java Collections (IList, ISet, IQueue)
    • Messaging (Topic, RingBuffer)
    • Computing (ExecutorService, M-R)
    19

    View Slide

  20. 20

    View Slide

  21. View Slide

  22. View Slide

  23. Green
    Primary
    Green
    Backup
    Green
    Shard

    View Slide

  24. Path to Microservices
    24
    Highly scalable In-Memory Data Grid
    Simple and
    configurable backbone
    one jar to «rule them all»
    Hazelcast
    is a…

    View Slide

  25. Problem
    • Lambda serialization
    25

    View Slide

  26. 26

    View Slide

  27. Solution
    • serializable version of the interfaces
    • Introducing DistributedStream
    27

    View Slide

  28. 28

    View Slide

  29. View Slide

  30. 30
    Jet Streams

    View Slide

  31. View Slide

  32. What’s Hazelcast Jet?
    • General purpose distributed data processing
    framework
    • Based on Direct Acyclic Graph to model data flow
    • Built on top of Hazelcast
    • Comparable to Apache Spark or Apache Flink
    32

    View Slide

  33. DAG
    33

    View Slide

  34. Job Execution
    34

    View Slide

  35. View Slide

  36. 36

    View Slide

  37. 37

    View Slide

  38. Conclusions
    • The distributed processing frameworks have
    proprietary API
    • But still: many Java developers know Java
    • Streams bring expressiveness and conciseness
    of functional and declarative languages to good old
    Java
    38

    View Slide

  39. Join the Community
    • Join Developer Preview program
    • http://bit.ly/gime_jet
    • Or shoot me an email [email protected]
    • Follow @hazelcast and @gamussa
    • Tweet your questions with #hazelcast #hazelcastjet
    39

    View Slide

  40. 40
    #reactivesummit #hazelcast
    #java8

    View Slide