Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Data Processing with Infinispan and...

Distributed Data Processing with Infinispan and Java Streams

Infinispan is a distributed in-memory key/value data store capable accelerating data processing using Hadoop, Spark and home-grown Map/Reduce APIs. Starting with Infinispan 8, you can now also use the Java 8 Stream API to process, transform and analyse the data stored in the grid, without burdening the architecture with external platforms. Processing can be applied to keys and/or values and it uses Infinispan’s data partitioning logic to distribute operations to nodes where data lives so that they can be executed locally. In this talk you’ll learn about this new extension to Java 8’s Stream class to process data in Infinispan and how it compares with existing APIs.

Galder Zamarreño

March 11, 2016
Tweet

More Decks by Galder Zamarreño

Other Decks in Programming

Transcript

  1. Moi • @ • Infinispan co-founder • JSR-107 • Scala

    developer since 2009 • Functional programming @galderz
  2. Clustering • Distribution mode • N copies of data in

    cluster • Data location defined by Consistent Hash
  3. Store & Retrieve • ConcurrentMap • JSR-107 Cache • CDI

    • SpringCache • Experimental Functional API
  4. Java 8 stream List<Integer> numbers = Arrays.asList( 4, 74, 20,

    97, 118, 50, 97, 34, 48); numbers.stream() .filter(i -> i > 70) // ^ Returns Stream<Integer> .map(n -> new String(Character.toChars(n))) // ^ Returns Stream<String> .reduce("", String::concat); Returns "Java"
  5. Lazyness IntStream iterStream = IntStream.iterate(0, i -> i + 1);

    IntStream.iterate(0, i -> i + 1) .forEach(System.out::println); IntStream.iterate(0, i -> i + 1) .limit(10) // Returns IntStream .forEach(System.out::println); // Returns void Does nothing Prints 1 to 10 Runs forever :(
  6. Topology changes • Streams processed without data loss when topology

    changes • Retries might happen... • Strive for idempotent lambdas • Idempotent forEach tricky...
  7. Special Intermediate operations • distinct → origin + remote •

    limit → origin + remote • skip/peek → origin only • sorted → origin only Memory ++
  8. "The Streams API will internally decompose your query to leverage

    the multiple cores on your computer." Raoul-Gabriel Urma
  9. "Infinispan Distributed Streams API will internally decompose your query to

    leverage the computing power of multiple machines" Galder Zamarreño
  10. Spark/Hadoop integration • Suits Spark/Hadoop users wanting different backend •

    Need to process data real- time, e.g. sliding windows • Remote access only
  11. spark Demo $ docker run -it --name master -h master

    -e "SLAVES=1" gustavonalle/infinispan-server-domain $ docker run --name spark-master -ti gustavonalle/spark $ docker exec -it spark-master /usr/local/spark/bin/spark- shell --master spark://172.17.0.3:7077 --packages org.infinispan:infinispan-spark_2.10:0.2 --conf spark.io.compression.codec=lz4 Spark Shell Spark Infinispan
  12. which api? • Start → Java Stream API • Spark/Hadoop

    require management/configuration • Query API helps with deep understanding of values
  13. Summary • Infinispan... • is a distributed K/V store •

    expands Java Streams to run in multi-node environments • offers more options for processing data: Spark/ Hadoop...etc
  14. credits engineer by Wilson Joseph from the Noun Project panel

    by gira Park from the Noun Project Approve by Aha-Soft from the Noun Project Database sharing by YuguDesign from the Noun Project ram by Andrea Rizzato from the Noun Project Database Search by Nimal Raj from the Noun Project Cloud Analytics by Kevin Augustine LO from the Noun Project Broken Computer by Dan Hetteix from the Noun Project data search by Gregor Črešnar from the Noun Project Server by Creative Stall from the Noun Project Network by Creative Stall from the Noun Project transformation by Felipe Perucho from the Noun Project analytics by Roman Kovbasyuk from the Noun Project Server by Designify.me from the Noun Project