Slide 1

Slide 1 text

Big Data in action with infinispan Galder Zamarreño Arrizabalaga 27th April 2017

Slide 2

Slide 2 text

Moi • @ • Infinispan developer & co-founder • Lead client/server architecture • Functional programming @galderz #infinispan

Slide 3

Slide 3 text

Build Infinispan based infrastructure to store, search and process near real-time data and calculate analytics

Slide 4

Slide 4 text

real-time data Real-time data is challenging Delays can have big impact

Slide 5

Slide 5 text

data growth Exponential data growth (smartphone, IoT...etc) How to analyse it?

Slide 6

Slide 6 text

in-memory data grids IMDG

Slide 7

Slide 7 text

What is a imdg? • Distributed in-memory data • Server "mesh" • Peer-to-peer (P2P) • No master/slaves • No single bottleneck • No single point failure • Commodity hardware

Slide 8

Slide 8 text

infinispan is a imdg Custom Applications Mobile Applications Web Apps & Websites JBoss Middleware Fuse "memory" across machines into a unified data store Read-through, write-through, write-behind • NoSQL • Extreme Performance • Linear Scalability • Fault Tolerant • Event processing • Configurable ACID Txn Infinispan Databases and/or file system Analytical Framework

Slide 9

Slide 9 text

Infinispan Use Cases Distributed Cache cache frequent data transient short-lived storage

Slide 10

Slide 10 text

Infinispan Use Cases Distributed Cache cache frequent data transient short-lived storage NoSQL Database key/value store ACID transactions

Slide 11

Slide 11 text

Infinispan Use Cases Distributed Cache cache frequent data transient short-lived storage NoSQL Database key/value store ACID transactions Event Broker listen to data changes continuous query

Slide 12

Slide 12 text

Infinispan Use Cases Distributed Cache cache frequent data transient short-lived storage NoSQL Database key/value store ACID transactions Event Broker listen to data changes continuous query Data Analytics map/ reduce via java stream spark/ hadoop integration

Slide 13

Slide 13 text

use examples • Web, Ecommerce • HTTP session • Shopping carts • Database/legacy offload: • Product catalog • Caching • Telecommunications • Cellular billings • Call routing, session info, • SMS content/notification • Travel • Aggregated flight pricing • Availability flights • Financial • Per-user portfolio data and risk analysis • Aggregated ticker stream • Defence • Sensor network data process and threat detection

Slide 14

Slide 14 text

Infinispan Use Cases Distributed Cache cache frequent data transient short-lived storage NoSQL Database key/value store ACID transactions Event Broker listen to data changes continuous query Data Analytics map/ reduce via java stream spark/ hadoop integration

Slide 15

Slide 15 text

Event Broker listen to data changes continuous query

Slide 16

Slide 16 text

continuous query Continuous Query combines complex querying with reactive data changes

Slide 17

Slide 17 text

Demo Domain Station board stop station train

Slide 18

Slide 18 text

Openshift Platform-as-a-Service (PaaS) Public or private Polyglot Based on Docker & Kubernetes

Slide 19

Slide 19 text

Vert.x Tool-kit for building reactive applications on the JVM Event-Driven & Non-Blocking Polyglot

Slide 20

Slide 20 text

Real-Time Demo Continuous Query Verticle Http App Verticle Data Grid Replication Sock JS Bridge Real Time Laptop Http Websockets JavaFX Injector Verticle

Slide 21

Slide 21 text

DEmo Real-time

Slide 22

Slide 22 text

Data Analytics map/ reduce via java stream spark/ hadoop integration

Slide 23

Slide 23 text

spark - hadoop Powerful analytics APIs Combo with Infinispan backend Separate process management

Slide 24

Slide 24 text

distributed java streams Extended Java 8 Stream API to data stored in

Slide 25

Slide 25 text

java 8 stream List numbers = Arrays.asList( 4, 74, 20, 97, 118, 50, 97, 34, 48); numbers.stream() .filter(i -> i > 70) // ^ Returns Stream .map(n -> new String(Character.toChars(n))) // ^ Returns Stream .reduce("", String::concat); Returns "Java"

Slide 26

Slide 26 text

Distributed streams map(λ) λ λ

Slide 27

Slide 27 text

What is the time of the day when there is the biggest ratio of delayed trains?

Slide 28

Slide 28 text

Analytics Demo Data Grid Replication Delay Calculator Server Task Delay Calculator Server Task Delay Calculator Server Task Analytics Verticle Injector Verticle Analytics Jupyter Laptop HTTP

Slide 29

Slide 29 text

Demo ANalytics

Slide 30

Slide 30 text

Build Infinispan based infrastructure to store, search and process near real-time data and calculate analytics real-time data challenge

Slide 31

Slide 31 text

Build Infinispan based infrastructure to store, search and process near real-time data and calculate analytics data growth problem real-time data challenge

Slide 32

Slide 32 text

Build Infinispan based infrastructure to store, search and process near real-time data and calculate analytics real-time data challenge data growth problem continuous query for real-time

Slide 33

Slide 33 text

Build Infinispan based infrastructure to store, search and process near real-time data and calculate analytics real-time data challenge data growth problem continuous query for real-time analysis with java streams

Slide 34

Slide 34 text

credits Approve by Aha-Soft from the Noun Project engineer by Wilson Joseph from the Noun Project transformation by Felipe Perucho from the Noun Project analytics by Roman Kovbasyuk from the Noun Project Database sharing by YuguDesign from the Noun Project Server by designify.me from the Noun Project

Slide 35

Slide 35 text

Thanks! • github.com/galderz/swiss-transport-datagrid • Branch: early17 • infinispan.org • openshift.com • vertx.io @galderz #infinispan