Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Data In Action With Infinispan

Big Data In Action With Infinispan

Dealing with real-time, in-memory, streaming data is a unique challenge and with the advent of the smartphone and IoT (trillions of internet connected devices), we are witnessing an exponential growth in data at scale.

Building data layers that can satisfy these requirements can be challenging, but with the help of Infinispan, an in-memory data grid from Red Hat, you can take advantage of state of the art distributed data processing capabilities to tackle these challenges. From classic or full-text queries, to Spark/Hadoop integrations via distributed Java Streams, these wide ranging data processing capabilities make Infinispan the perfect choice for the Big Data era.

In this session, we will identify critical patterns and principles that will help you achieve greater scale and response speed. On top of that, you will witness how Infinispan follows these patterns and principles to tackle a big data situation via a live coding demonstration.

Galder Zamarreño

September 07, 2017
Tweet

More Decks by Galder Zamarreño

Other Decks in Programming

Transcript

  1. Big Data in action
    with infinispan
    Galder Zamarreño Arrizabalaga
    7th September 2017

    View Slide

  2. Build Infinispan based infrastructure
    to store, search and process
    near real-time data and
    calculate analytics

    View Slide

  3. real-time data
    Real-time data is challenging
    Delays can have big impact

    View Slide

  4. data growth
    Exponential data growth
    (smartphone, IoT...etc)
    How to analyse it?

    View Slide

  5. in-memory data grids
    IMDG

    View Slide

  6. What is a imdg?
    • Distributed in-memory data
    • Server "mesh"
    • Peer-to-peer (P2P)
    • No master/slaves
    • No single bottleneck
    • No single point failure
    • Commodity hardware

    View Slide

  7. infinispan is a imdg
    Custom
    Applications
    Mobile
    Applications
    Web Apps &
    Websites
    JBoss
    Middleware
    Fuse "memory" across machines into a unified data store
    Read-through, write-through, write-behind
    • NoSQL
    • Extreme Performance
    • Linear Scalability
    • Fault Tolerant
    • Event processing
    • Configurable ACID Txn
    Infinispan
    Databases and/or file system
    Analytical
    Framework

    View Slide

  8. Infinispan
    Use Cases
    Event
    Broker
    listen to
    data
    changes
    continuous
    query
    Data
    Analytics
    map/
    reduce via
    java stream
    spark/
    hadoop
    integration
    Distributed
    Cache
    cache
    frequent
    data
    transient
    short-lived
    storage
    NoSQL
    Database
    key/value
    store
    ACID
    transactions

    View Slide

  9. Event
    Broker
    listen to
    data
    changes
    continuous
    query

    View Slide

  10. continuous query
    Continuous Query combines
    complex querying with reactive
    data changes

    View Slide

  11. Demo Domain
    Station
    board
    stop
    station
    train

    View Slide

  12. Openshift
    Platform-as-a-Service (PaaS)
    Public or private
    Polyglot
    Based on Docker & Kubernetes

    View Slide

  13. Vert.x
    Tool-kit for building reactive
    applications on the JVM
    Event-Driven & Non-Blocking
    Polyglot

    View Slide

  14. Real-Time Demo
    Continuous Query
    Verticle
    Http App
    Verticle
    Data Grid
    Replication
    Sock JS Bridge
    Real Time Laptop
    Http
    Websockets
    JavaFX
    Injector
    Verticle

    View Slide

  15. DEmo Real-time

    View Slide

  16. Data
    Analytics
    map/
    reduce via
    java stream
    spark/
    hadoop
    integration

    View Slide

  17. spark - hadoop
    Powerful analytics APIs
    Combo with Infinispan backend
    Separate process management

    View Slide

  18. distributed java streams
    Extended Java 8 Stream API to
    data stored in

    View Slide

  19. Distributed streams
    map(λ)
    λ
    λ

    View Slide

  20. What is the time of the day
    when there is the biggest
    ratio of delayed trains?

    View Slide

  21. Analytics Demo
    Data Grid
    Replication
    Delay Calculator
    Server Task
    Delay Calculator
    Server Task
    Delay Calculator
    Server Task
    Analytics
    Verticle
    Injector
    Verticle
    Analytics
    Jupyter
    Laptop
    HTTP

    View Slide

  22. Demo ANalytics

    View Slide

  23. Build Infinispan based infrastructure
    to store, search and process
    near real-time data and
    calculate analytics

    View Slide

  24. credits
    Approve by Aha-Soft
    from the Noun Project
    engineer by Wilson Joseph
    from the Noun Project
    transformation by Felipe Perucho
    from the Noun Project
    analytics by Roman Kovbasyuk
    from the Noun Project
    Database sharing by YuguDesign
    from the Noun Project
    Server by designify.me
    from the Noun Project

    View Slide

  25. Thanks!
    • github.com/infinispan-demos/swiss-transport-datagrid
    • Branch: early17
    • infinispan.org
    • openshift.com
    • vertx.io
    @galderz
    #infinispan

    View Slide