Slide 1

Slide 1 text

PRINCIPLES AND PATTERNS FOR STREAMING DATA ANALYSIS Voxxed Days Ticino Galder Zamarreño Arrizabalaga
 @galderz
 20th October 2018

Slide 2

Slide 2 text

@GALDERZ #INFINISPAN #VDT18 2 Since 2006 ENGINEER @galderz Community Lead and Core Developer INFINISPAN CO-FOUNDER (2008) OTIS PAIR PROGRAMMING BUDDY

Slide 3

Slide 3 text

@GALDERZ #INFINISPAN #VDT18 3 DATA IS OVERWHELMING US Delays can have a big impact EXPONENTIAL DATA GROWTH YEAR ON YEAR Smartphones, IOT devices, trillions of internet connected devices... REAL-TIME STREAMING DATA PROCESSING IS CHALLENGING

Slide 4

Slide 4 text

@GALDERZ #INFINISPAN #VDT18 4 HIGH LEVEL ARCHITECTURE

Slide 5

Slide 5 text

@GALDERZ #INFINISPAN #VDT18 5 COLLECTION TIER

Slide 6

Slide 6 text

@GALDERZ #INFINISPAN #VDT18 6 COMMON INTERACTION PATTERNS

Slide 7

Slide 7 text

@GALDERZ #INFINISPAN #VDT18 7 REQUEST/RESPONSE PATTERN connection request client server response

Slide 8

Slide 8 text

@GALDERZ #INFINISPAN #VDT18 8 REQUEST/RESPONSE PATTERN connection request client server response NON-BLOCKING ASYNCHRONOUS BLOCKING SYNCHRONOUS

Slide 9

Slide 9 text

@GALDERZ #INFINISPAN #VDT18 9 REQUEST/ACKNOWLEDGMENT PATTERN connection request client server ack

Slide 10

Slide 10 text

@GALDERZ #INFINISPAN #VDT18 10 PUBLISH / SUBSCRIBE PATTERN producer consumer broker topic A topic B msg msg subscribe subscribe

Slide 11

Slide 11 text

@GALDERZ #INFINISPAN #VDT18 11 ONE-WAY PATTERN connection request client server

Slide 12

Slide 12 text

@GALDERZ #INFINISPAN #VDT18 12 STREAM PATTERN connection request client server response response …

Slide 13

Slide 13 text

@GALDERZ #INFINISPAN #VDT18 13 MESSAGE QUEUE TIER Decoupling collection and analysis tier

Slide 14

Slide 14 text

@GALDERZ #INFINISPAN #VDT18 14 e.g. analysis tier being more processing-intensive or analysis tier consuming messages in batches Fast collection tier combined with slow analysis tier WHY DECOUPLE?

Slide 15

Slide 15 text

@GALDERZ #INFINISPAN #VDT18 15 DURABLE MESSAGING Disaster recovery Offline consumption Fault tolerance

Slide 16

Slide 16 text

@GALDERZ #INFINISPAN #VDT18 16 DELIVERY SEMANTICS At-least-once : messages not lost but might be repeated Exactly-once : messages not lost and consumed only once At-most-once : messages might get lost

Slide 17

Slide 17 text

@GALDERZ #INFINISPAN #VDT18 17 BULLSHIT!

Slide 18

Slide 18 text

@GALDERZ #INFINISPAN #VDT18 18 • Guaranteed delivery vs guaranteed processing of message • What if a subscriber consumes the message and then it crashes? • Guaranteed delivery and processing requires application awareness and collaboration • So subscriber can IDEMPOTENTLY process a message and know to which point it's processed it • At this point you're capable of doing at least once • Also requires consumer to acknowledge processing to publisher EXACTLY-ONCE MISLEADING OR LIE!

Slide 19

Slide 19 text

@GALDERZ #INFINISPAN #VDT18 19 ANALYSIS TIER

Slide 20

Slide 20 text

@GALDERZ #INFINISPAN #VDT18 20 IN-FLIGHT ANALYSIS Traditional RDMS : data at rest and query for answers Streaming : data moved through the query Data always in motion from message queue tier

Slide 21

Slide 21 text

@GALDERZ #INFINISPAN #VDT18 21 CONTINUOUS QUERY New data that matches query pushed to client Use cases : tracking behaviour, traffic/safety, fraud analytics... Query constantly evaluated

Slide 22

Slide 22 text

@GALDERZ #INFINISPAN #VDT18 22 SLIDING WINDOW e.g. traffic information in my area for last hour Combines queries with time constraints

Slide 23

Slide 23 text

@GALDERZ #INFINISPAN #VDT18 23 DATA ACCESS TIER

Slide 24

Slide 24 text

@GALDERZ #INFINISPAN #VDT18 24 PROTOCOLS TO SEND DATA TO CLIENTS Protocol Message frequency Communication direction Message latency Efficiency Fault tolerance / Reliability Webhooks Low Uni-directional (server to client) Average Low None HTTP Long Polling Average Bi-directional Average Average None Server-sent events High Uni-directional Low High None by default. Can be implemented. WebSocket s High Bi-directional Low High None by default. Can be implemented.

Slide 25

Slide 25 text

@GALDERZ #INFINISPAN #VDT18 25 APPLIED ARCHITECTURE

Slide 26

Slide 26 text

@GALDERZ #INFINISPAN #VDT18 26 Platform-as-a-Service (PaaS) Platform for developing and running applications Public or private and multi-language OpenShift is a Kubernetes distro with extras THE PLATFORM

Slide 27

Slide 27 text

@GALDERZ #INFINISPAN #VDT18 27 APPLIED ARCHITECTURE

Slide 28

Slide 28 text

@GALDERZ #INFINISPAN #VDT18 28 Vert.x is a toolkit for building reactive apps On JVM, event-driven and non-blocking RxJava integrates with Vert.x Great at event transform and coordination Works best with many source of events (modern apps!) THE GLUE

Slide 29

Slide 29 text

@GALDERZ #INFINISPAN #VDT18 29 APPLIED ARCHITECTURE

Slide 30

Slide 30 text

@GALDERZ #INFINISPAN #VDT18 30 INFINISPAN - IN-MEMORY KEY/VALUE STORE

Slide 31

Slide 31 text

@GALDERZ #INFINISPAN #VDT18 31 THE DATA transport.opendata.ch + sbb.ch { "x":"8290840" ,"y":"47483629" ,"name":"IR 1978" ,"poly":[ {"x":"8290840","y":"47483629",...} , {"x":"8290193","y":"47483647"...,"msec":"2000" , ...] }

Slide 32

Slide 32 text

@GALDERZ #INFINISPAN #VDT18 32 COMPONENT ARCHITECTURE datagrid infinispan pod infinispan pod infinispan pod datagrid-hotrod service /eventbus/delayed-trains delayed trains /eventbus/delayed-positions app main http vert.x verticle pod station boards vert.x verticle train positions vert.x verticle delayed positions

Slide 33

Slide 33 text

@GALDERZ #INFINISPAN #VDT18 33 DEMO TIME!

Slide 34

Slide 34 text

THANK YOU! github.com/infinispan-demos/streaming-data-kubernetes infinispan.org redhat.com/en/technologies/jboss-middleware/data-grid openshift.com | vertx.io