$30 off During Our Annual Pro Sale. View Details »

[NYJavaSig] Riding The Distributed Streams

Viktor Gamov
February 03, 2017

[NYJavaSig] Riding The Distributed Streams

Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig

Viktor Gamov

February 03, 2017
Tweet

More Decks by Viktor Gamov

Other Decks in Technology

Transcript

  1. View Slide

  2. > whoami
    • Solutions Architect @Hazelcast
    • Hang out with awesome people
    • @gamussa in internetz
    Please, follow me in Twitter
    I’m very interesting ©

    View Slide

  3. Agenda
    • Refreshing knowledge on Java 8 Streams
    • Distribute and Conquer
    • Distributed Data
    • Distributed Streams
    • How we did all this

    View Slide

  4. Java 8 Streams

    View Slide

  5. Java 8 Streams…
    • An abstraction represents a sequence of
    elements
    • Is not a data structure
    • Convey elements from a source through a pipeline
    of operations
    • Operation doesn’t modify a source

    View Slide

  6. Why I should care about
    Stream API?
    • You’re Java developer

    View Slide

  7. What does regular Java developer think about Scala?
    advanced

    View Slide

  8. Why I should care about
    Stream API?
    • You’re Java developer
    • Many Java developers know Java
    • It’s all about data processing

    View Slide

  9. java.util.stream
    operations
    • map(), flatMap(), filter()
    • reduce(), collect()
    • sorted()

    View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. Problem
    • One does not simply put all Big Data in one
    machine

    View Slide

  14. Problem
    • Data doesn’t fit just one machine

    View Slide

  15. Problem
    • One does not simply put all Big Data in one
    machine
    • Data is too important to have it only one machine

    View Slide

  16. View Slide

  17. CACHES

    View Slide

  18. Replication on Sharding?
    http://book.mixu.net/distsys/single-page.html

    View Slide

  19. Solution
    • Use Distributed Map aka IMap

    View Slide

  20. What’s Hazelcast IMDG?
    • In-memory Data Grid
    • Apache v2 Licensed
    • Distributed
    • Caches (IMap, JCache)
    • Java Collections (IList, ISet, IQueue)
    • Messaging (Topic, RingBuffer)
    • Computation (ExecutorService, M-R)

    View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. Green
    Primary
    Green
    Backup
    Green
    Shard

    View Slide

  25. View Slide

  26. Problem
    • Lambda serialization
    26

    View Slide

  27. 27

    View Slide

  28. Solution
    • serializable version of the interfaces
    • Introducing DistributedStream
    28

    View Slide

  29. 29

    View Slide

  30. View Slide

  31. 31
    Jet Streams

    View Slide

  32. View Slide

  33. What’s Hazelcast Jet?
    • General purpose distributed data processing
    framework
    • Based on Direct Acyclic Graph to model data flow
    • Built on top of Hazelcast IMDG
    • Comparable to Apache Spark or Apache Flink
    33

    View Slide

  34. View Slide

  35. DAG
    35

    View Slide

  36. Job Execution
    36

    View Slide

  37. View Slide

  38. Future (It’s bright!)
    • Memory module for processing big data
    • Higher level streaming and batching APIs
    • Reactive Streams
    • Distributed Classloading
    • Integrations (HDFS/Yarn/Mesos)

    View Slide

  39. Your fuel, our Jet Engine
    • Public release – Feb 7th.
    • Developer Preview today - yay!
    • http://hazelcast.org/jet-signup
    • Send me a note [email protected]
    • Follow @hazelcast and @gamussa (duh!!)
    • Your questions #hazelcast #hazelcastjet

    View Slide

  40. Conclusion
    • Java Stream API provides very white range of data
    processing tools
    • War And Piece – is a Big (a lot of data) Book!
    • Now we’re pretty sure that Andrew and Pierre are
    the main characters

    View Slide

  41. View Slide