$30 off During Our Annual Pro Sale. View Details »

Principles and Patterns for Streaming Data Analysis

Principles and Patterns for Streaming Data Analysis

Data is overwhelming us both in terms of size and speed. How do we deal with these huge amount of real-time, streaming data? We need to combine tools, platform, patterns, and principles to overcome this situation.

Join on us this session where we’ll identify critical patterns and principles that enable us to achieve greater scale and response speed. We’ll provide you with a live demo demonstrating how an In-Memory Data Grid like Infinispan and a platform like Kubernetes can leverage these patterns and principles creating a state-of-the-art distributed data processing architecture.

Galder Zamarreño

October 20, 2018
Tweet

More Decks by Galder Zamarreño

Other Decks in Programming

Transcript

  1. PRINCIPLES AND PATTERNS FOR
    STREAMING DATA ANALYSIS
    Voxxed Days Ticino
    Galder Zamarreño Arrizabalaga

    @galderz

    20th October 2018

    View Slide

  2. @GALDERZ #INFINISPAN #VDT18
    2
    Since 2006
    ENGINEER
    @galderz
    Community Lead and
    Core Developer
    INFINISPAN
    CO-FOUNDER (2008)
    OTIS
    PAIR PROGRAMMING BUDDY

    View Slide

  3. @GALDERZ #INFINISPAN #VDT18
    3
    DATA IS OVERWHELMING US
    Delays can have a big impact
    EXPONENTIAL DATA GROWTH
    YEAR ON YEAR
    Smartphones, IOT devices, trillions of internet
    connected devices...
    REAL-TIME STREAMING DATA
    PROCESSING IS CHALLENGING

    View Slide

  4. @GALDERZ #INFINISPAN #VDT18
    4
    HIGH LEVEL ARCHITECTURE

    View Slide

  5. @GALDERZ #INFINISPAN #VDT18
    5
    COLLECTION TIER

    View Slide

  6. @GALDERZ #INFINISPAN #VDT18
    6
    COMMON INTERACTION PATTERNS

    View Slide

  7. @GALDERZ #INFINISPAN #VDT18
    7
    REQUEST/RESPONSE PATTERN
    connection
    request
    client server
    response

    View Slide

  8. @GALDERZ #INFINISPAN #VDT18
    8
    REQUEST/RESPONSE PATTERN
    connection
    request
    client server
    response
    NON-BLOCKING ASYNCHRONOUS
    BLOCKING SYNCHRONOUS

    View Slide

  9. @GALDERZ #INFINISPAN #VDT18
    9
    REQUEST/ACKNOWLEDGMENT PATTERN
    connection
    request
    client server
    ack

    View Slide

  10. @GALDERZ #INFINISPAN #VDT18
    10
    PUBLISH / SUBSCRIBE PATTERN
    producer consumer
    broker
    topic A
    topic B
    msg
    msg
    subscribe
    subscribe

    View Slide

  11. @GALDERZ #INFINISPAN #VDT18
    11
    ONE-WAY PATTERN
    connection
    request
    client server

    View Slide

  12. @GALDERZ #INFINISPAN #VDT18
    12
    STREAM PATTERN
    connection
    request
    client server
    response
    response

    View Slide

  13. @GALDERZ #INFINISPAN #VDT18
    13
    MESSAGE QUEUE TIER
    Decoupling collection and analysis tier

    View Slide

  14. @GALDERZ #INFINISPAN #VDT18
    14
    e.g. analysis tier being more processing-intensive
    or analysis tier consuming messages in batches
    Fast collection tier combined with slow analysis tier
    WHY DECOUPLE?

    View Slide

  15. @GALDERZ #INFINISPAN #VDT18
    15
    DURABLE MESSAGING
    Disaster recovery
    Offline consumption
    Fault tolerance

    View Slide

  16. @GALDERZ #INFINISPAN #VDT18
    16
    DELIVERY SEMANTICS
    At-least-once : messages not lost but might be repeated
    Exactly-once : messages not lost and consumed only once
    At-most-once : messages might get lost

    View Slide

  17. @GALDERZ #INFINISPAN #VDT18
    17
    BULLSHIT!

    View Slide

  18. @GALDERZ #INFINISPAN #VDT18
    18
    • Guaranteed delivery vs guaranteed processing of message
    • What if a subscriber consumes the message and then it crashes?
    • Guaranteed delivery and processing requires application awareness and collaboration
    • So subscriber can IDEMPOTENTLY process a message and know to which point it's processed it
    • At this point you're capable of doing at least once
    • Also requires consumer to acknowledge processing to publisher
    EXACTLY-ONCE MISLEADING OR LIE!

    View Slide

  19. @GALDERZ #INFINISPAN #VDT18
    19
    ANALYSIS TIER

    View Slide

  20. @GALDERZ #INFINISPAN #VDT18
    20
    IN-FLIGHT ANALYSIS
    Traditional RDMS : data at rest and query for answers
    Streaming : data moved through the query
    Data always in motion from message queue tier

    View Slide

  21. @GALDERZ #INFINISPAN #VDT18
    21
    CONTINUOUS QUERY
    New data that matches query pushed to client
    Use cases : tracking behaviour, traffic/safety, fraud analytics...
    Query constantly evaluated

    View Slide

  22. @GALDERZ #INFINISPAN #VDT18
    22
    SLIDING WINDOW
    e.g. traffic information in my area for last hour
    Combines queries with time constraints

    View Slide

  23. @GALDERZ #INFINISPAN #VDT18
    23
    DATA ACCESS TIER

    View Slide

  24. @GALDERZ #INFINISPAN #VDT18
    24
    PROTOCOLS TO SEND DATA TO CLIENTS
    Protocol
    Message
    frequency
    Communication
    direction
    Message
    latency
    Efficiency Fault tolerance / Reliability
    Webhooks Low
    Uni-directional
    (server to client)
    Average Low None
    HTTP Long
    Polling
    Average Bi-directional Average Average None
    Server-sent
    events
    High Uni-directional Low High
    None by default. Can be
    implemented.
    WebSocket
    s
    High Bi-directional Low High
    None by default. Can be
    implemented.

    View Slide

  25. @GALDERZ #INFINISPAN #VDT18
    25
    APPLIED ARCHITECTURE

    View Slide

  26. @GALDERZ #INFINISPAN #VDT18
    26
    Platform-as-a-Service (PaaS)
    Platform for developing and running
    applications
    Public or private and multi-language
    OpenShift is a Kubernetes distro with extras
    THE PLATFORM

    View Slide

  27. @GALDERZ #INFINISPAN #VDT18
    27
    APPLIED ARCHITECTURE

    View Slide

  28. @GALDERZ #INFINISPAN #VDT18
    28
    Vert.x is a toolkit for building reactive apps
    On JVM, event-driven and non-blocking
    RxJava integrates with Vert.x
    Great at event transform and coordination
    Works best with many source of events (modern apps!)
    THE GLUE

    View Slide

  29. @GALDERZ #INFINISPAN #VDT18
    29
    APPLIED ARCHITECTURE

    View Slide

  30. @GALDERZ #INFINISPAN #VDT18
    30
    INFINISPAN - IN-MEMORY KEY/VALUE STORE

    View Slide

  31. @GALDERZ #INFINISPAN #VDT18
    31
    THE DATA
    transport.opendata.ch + sbb.ch
    {
    "x":"8290840"
    ,"y":"47483629"
    ,"name":"IR 1978"
    ,"poly":[
    {"x":"8290840","y":"47483629",...}
    , {"x":"8290193","y":"47483647"...,"msec":"2000"
    , ...]
    }

    View Slide

  32. @GALDERZ #INFINISPAN #VDT18
    32
    COMPONENT ARCHITECTURE
    datagrid
    infinispan
    pod
    infinispan
    pod
    infinispan
    pod
    datagrid-hotrod
    service
    /eventbus/delayed-trains
    delayed trains
    /eventbus/delayed-positions
    app
    main http
    vert.x verticle pod
    station boards
    vert.x verticle
    train positions
    vert.x verticle
    delayed positions

    View Slide

  33. @GALDERZ #INFINISPAN #VDT18
    33
    DEMO TIME!

    View Slide

  34. THANK YOU!
    github.com/infinispan-demos/streaming-data-kubernetes
    infinispan.org
    redhat.com/en/technologies/jboss-middleware/data-grid
    openshift.com | vertx.io

    View Slide