Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[NYJavaSig] Riding The Distributed Streams

0680be1c881abcf19219f09f1e8cf140?s=47 Viktor Gamov
February 03, 2017

[NYJavaSig] Riding The Distributed Streams

Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig

0680be1c881abcf19219f09f1e8cf140?s=128

Viktor Gamov

February 03, 2017
Tweet

More Decks by Viktor Gamov

Other Decks in Technology

Transcript

  1. None
  2. > whoami • Solutions Architect @Hazelcast • Hang out with

    awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©
  3. Agenda • Refreshing knowledge on Java 8 Streams • Distribute

    and Conquer • Distributed Data • Distributed Streams • How we did all this
  4. Java 8 Streams

  5. Java 8 Streams… • An abstraction represents a sequence of

    elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source
  6. Why I should care about Stream API? • You’re Java

    developer
  7. What does regular Java developer think about Scala? advanced

  8. Why I should care about Stream API? • You’re Java

    developer • Many Java developers know Java • It’s all about data processing
  9. java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •

    sorted()
  10. None
  11. None
  12. None
  13. Problem • One does not simply put all Big Data

    in one machine
  14. Problem • Data doesn’t fit just one machine

  15. Problem • One does not simply put all Big Data

    in one machine • Data is too important to have it only one machine
  16. None
  17. CACHES

  18. Replication on Sharding? http://book.mixu.net/distsys/single-page.html

  19. Solution • Use Distributed Map aka IMap

  20. What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2

    Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)
  21. None
  22. None
  23. None
  24. Green Primary Green Backup Green Shard

  25. None
  26. Problem • Lambda serialization 26

  27. 27

  28. Solution • serializable version of the interfaces • Introducing DistributedStream

    28
  29. 29

  30. None
  31. 31 Jet Streams

  32. None
  33. What’s Hazelcast Jet? • General purpose distributed data processing framework

    • Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33
  34. None
  35. DAG 35

  36. Job Execution 36

  37. None
  38. Future (It’s bright!) • Memory module for processing big data

    • Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)
  39. Your fuel, our Jet Engine • Public release – Feb

    7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note viktor@hazelcast.com • Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet
  40. Conclusion • Java Stream API provides very white range of

    data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters
  41. None