Slide 1

Slide 1 text

Distributed Java Streams with Infinispan Galder Zamarreño Arrizabalaga 12th May 2016

Slide 2

Slide 2 text

Moi • @ • Infinispan co-founder • JSR-107 • Scala developer since 2009 • Functional programming @galderz

Slide 3

Slide 3 text

What is infinispan?

Slide 4

Slide 4 text

Not just nosql key/value data store with optional schema, available under the ASL 2.0 license

Slide 5

Slide 5 text

From local... Store data from slow systems Store data hard to compute

Slide 6

Slide 6 text

...Via Temporary... Store temporary data in for data that should survive , e.g.

Slide 7

Slide 7 text

... to data grids Used as primary store for :

Slide 8

Slide 8 text

access mode Application and data live in same JVM

Slide 9

Slide 9 text

access mode Application and data separated by network

Slide 10

Slide 10 text

Clustering • Distribution mode • N copies of data in cluster • Data location defined by Consistent Hash

Slide 11

Slide 11 text

Store & Retrieve • ConcurrentMap • JSR-107 Cache • CDI • SpringCache • Experimental Functional API

Slide 12

Slide 12 text

Compute Extended Java 8 Stream API to data stored in

Slide 13

Slide 13 text

Java 8 stream List numbers = Arrays.asList( 4, 74, 20, 97, 118, 50, 97, 34, 48); numbers.stream() .filter(i -> i > 70) // ^ Returns Stream .map(n -> new String(Character.toChars(n))) // ^ Returns Stream .reduce("", String::concat); Returns "Java"

Slide 14

Slide 14 text

Lazyness IntStream iterStream = IntStream.iterate(0, i -> i + 1); IntStream.iterate(0, i -> i + 1) .forEach(System.out::println); IntStream.iterate(0, i -> i + 1) .limit(10) // Returns IntStream .forEach(System.out::println); // Returns void Does nothing Prints 1 to 10 Runs forever :(

Slide 15

Slide 15 text

Distributed Streams map(λ) λ λ

Slide 16

Slide 16 text

Demo github.com/galderz/distributed- streams

Slide 17

Slide 17 text

Topology changes • Streams processed without data loss when topology changes • Retries might happen... • Strive for idempotent lambdas • Idempotent forEach tricky...

Slide 18

Slide 18 text

Special Intermediate operations • distinct → origin + remote • limit → origin + remote • skip/peek → origin only • sorted → origin only Memory ++

Slide 19

Slide 19 text

"The Streams API will internally decompose your query to leverage the multiple cores on your computer." Raoul-Gabriel Urma

Slide 20

Slide 20 text

"Infinispan Distributed Streams API will internally decompose your query to leverage the computing power of multiple machines" Galder Zamarreño Arrizabalaga

Slide 21

Slide 21 text

Spark/Hadoop integration • Try different storage • Process data real-time • Combine multiple sources including Infinispan • Complex querys: SQL + joins • Remote access only

Slide 22

Slide 22 text

spark Demo $ docker run -it --name master -h master -e "SLAVES=1" gustavonalle/infinispan-server-domain $ docker run --name spark-master -ti gustavonalle/spark $ docker exec -it spark-master /usr/local/spark/bin/spark- shell --master spark://172.17.0.3:7077 --packages org.infinispan:infinispan-spark_2.10:0.2 --conf spark.io.compression.codec=lz4 Spark Shell Spark Infinispan

Slide 23

Slide 23 text

Spark DEMO link Code for demo: blog.infinispan.org/2015/08/ infinispan-spark-connector-01- released.html

Slide 24

Slide 24 text

which api? • Start → Java Stream API • Spark/Hadoop require management/configuration • Query API helps with deep understanding of values

Slide 25

Slide 25 text

Summary • Infinispan... • is a distributed K/V store • expands Java Streams to run in multi-node environments • offers more options for processing data: Spark/ Hadoop...etc

Slide 26

Slide 26 text

credits engineer by Wilson Joseph from the Noun Project panel by gira Park from the Noun Project Approve by Aha-Soft from the Noun Project Database sharing by YuguDesign from the Noun Project ram by Andrea Rizzato from the Noun Project Database Search by Nimal Raj from the Noun Project Cloud Analytics by Kevin Augustine LO from the Noun Project Broken Computer by Dan Hetteix from the Noun Project data search by Gregor Črešnar from the Noun Project Server by Creative Stall from the Noun Project Network by Creative Stall from the Noun Project transformation by Felipe Perucho from the Noun Project analytics by Roman Kovbasyuk from the Noun Project Server by Designify.me from the Noun Project

Slide 27

Slide 27 text

Thanks http://infinispan.org http://blog.infinispan.org @infinispan @galderz