Principles and Patterns for Streaming Data Analysis

Principles and Patterns for Streaming Data Analysis

Data is overwhelming us both in terms of size and speed. How do we deal with these huge amount of real-time, streaming data? We need to combine tools, platform, patterns, and principles to overcome this situation.

Join on us this session where we’ll identify critical patterns and principles that enable us to achieve greater scale and response speed. We’ll provide you with a live demo demonstrating how an In-Memory Data Grid like Infinispan and a platform like Kubernetes can leverage these patterns and principles creating a state-of-the-art distributed data processing architecture.

5438f857ad449f373323e64a763365c5?s=128

Galder Zamarreño

October 20, 2018
Tweet

Transcript

  1. 1.

    PRINCIPLES AND PATTERNS FOR STREAMING DATA ANALYSIS Voxxed Days Ticino

    Galder Zamarreño Arrizabalaga
 @galderz
 20th October 2018
  2. 2.

    @GALDERZ #INFINISPAN #VDT18 2 Since 2006 ENGINEER @galderz Community Lead

    and Core Developer INFINISPAN CO-FOUNDER (2008) OTIS PAIR PROGRAMMING BUDDY
  3. 3.

    @GALDERZ #INFINISPAN #VDT18 3 DATA IS OVERWHELMING US Delays can

    have a big impact EXPONENTIAL DATA GROWTH YEAR ON YEAR Smartphones, IOT devices, trillions of internet connected devices... REAL-TIME STREAMING DATA PROCESSING IS CHALLENGING
  4. 8.

    @GALDERZ #INFINISPAN #VDT18 8 REQUEST/RESPONSE PATTERN connection request client server

    response NON-BLOCKING ASYNCHRONOUS BLOCKING SYNCHRONOUS
  5. 10.

    @GALDERZ #INFINISPAN #VDT18 10 PUBLISH / SUBSCRIBE PATTERN producer consumer

    broker topic A topic B msg msg subscribe subscribe
  6. 14.

    @GALDERZ #INFINISPAN #VDT18 14 e.g. analysis tier being more processing-intensive

    or analysis tier consuming messages in batches Fast collection tier combined with slow analysis tier WHY DECOUPLE?
  7. 16.

    @GALDERZ #INFINISPAN #VDT18 16 DELIVERY SEMANTICS At-least-once : messages not

    lost but might be repeated Exactly-once : messages not lost and consumed only once At-most-once : messages might get lost
  8. 18.

    @GALDERZ #INFINISPAN #VDT18 18 • Guaranteed delivery vs guaranteed processing

    of message • What if a subscriber consumes the message and then it crashes? • Guaranteed delivery and processing requires application awareness and collaboration • So subscriber can IDEMPOTENTLY process a message and know to which point it's processed it • At this point you're capable of doing at least once • Also requires consumer to acknowledge processing to publisher EXACTLY-ONCE MISLEADING OR LIE!
  9. 20.

    @GALDERZ #INFINISPAN #VDT18 20 IN-FLIGHT ANALYSIS Traditional RDMS : data

    at rest and query for answers Streaming : data moved through the query Data always in motion from message queue tier
  10. 21.

    @GALDERZ #INFINISPAN #VDT18 21 CONTINUOUS QUERY New data that matches

    query pushed to client Use cases : tracking behaviour, traffic/safety, fraud analytics... Query constantly evaluated
  11. 22.

    @GALDERZ #INFINISPAN #VDT18 22 SLIDING WINDOW e.g. traffic information in

    my area for last hour Combines queries with time constraints
  12. 24.

    @GALDERZ #INFINISPAN #VDT18 24 PROTOCOLS TO SEND DATA TO CLIENTS

    Protocol Message frequency Communication direction Message latency Efficiency Fault tolerance / Reliability Webhooks Low Uni-directional (server to client) Average Low None HTTP Long Polling Average Bi-directional Average Average None Server-sent events High Uni-directional Low High None by default. Can be implemented. WebSocket s High Bi-directional Low High None by default. Can be implemented.
  13. 26.

    @GALDERZ #INFINISPAN #VDT18 26 Platform-as-a-Service (PaaS) Platform for developing and

    running applications Public or private and multi-language OpenShift is a Kubernetes distro with extras THE PLATFORM
  14. 28.

    @GALDERZ #INFINISPAN #VDT18 28 Vert.x is a toolkit for building

    reactive apps On JVM, event-driven and non-blocking RxJava integrates with Vert.x Great at event transform and coordination Works best with many source of events (modern apps!) THE GLUE
  15. 31.

    @GALDERZ #INFINISPAN #VDT18 31 THE DATA transport.opendata.ch + sbb.ch {

    "x":"8290840" ,"y":"47483629" ,"name":"IR 1978" ,"poly":[ {"x":"8290840","y":"47483629",...} , {"x":"8290193","y":"47483647"...,"msec":"2000" , ...] }
  16. 32.

    @GALDERZ #INFINISPAN #VDT18 32 COMPONENT ARCHITECTURE datagrid infinispan pod infinispan

    pod infinispan pod datagrid-hotrod service /eventbus/delayed-trains delayed trains /eventbus/delayed-positions app main http vert.x verticle pod station boards vert.x verticle train positions vert.x verticle delayed positions