Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Data In Action with Infinispan

Big Data In Action with Infinispan

Dealing with real-time, in-memory, streaming data is a unique challenge and with the advent of the smartphone and IoT (trillions of internet connected devices), we are witnessing an exponential growth in data at scale.

Building data layers that can satisfy these requirements can be challenging, but with the help of Infinispan, an in-memory data grid from Red Hat, you can take advantage of state of the art distributed data processing capabilities to tackle these challenges. From classic or full-text queries, to Spark/Hadoop integrations via distributed Java Streams, these wide ranging data processing capabilities make Infinispan the perfect choice for the Big Data era.

In this session, we will identify critical patterns and principles that will help you achieve greater scale and response speed. On top of that, you will witness how Infinispan follows these patterns and principles to tackle a big data situation via a live coding demonstration.

Galder Zamarreño

April 27, 2017
Tweet

More Decks by Galder Zamarreño

Other Decks in Programming

Transcript

  1. Big Data in action
    with infinispan
    Galder Zamarreño Arrizabalaga
    27th April 2017

    View Slide

  2. Moi
    • @
    • Infinispan developer & co-founder
    • Lead client/server architecture
    • Functional programming
    @galderz
    #infinispan

    View Slide

  3. Build Infinispan based infrastructure
    to store, search and process
    near real-time data and
    calculate analytics

    View Slide

  4. real-time data
    Real-time data is challenging
    Delays can have big impact

    View Slide

  5. data growth
    Exponential data growth
    (smartphone, IoT...etc)
    How to analyse it?

    View Slide

  6. in-memory data grids
    IMDG

    View Slide

  7. What is a imdg?
    • Distributed in-memory data
    • Server "mesh"
    • Peer-to-peer (P2P)
    • No master/slaves
    • No single bottleneck
    • No single point failure
    • Commodity hardware

    View Slide

  8. infinispan is a imdg
    Custom
    Applications
    Mobile
    Applications
    Web Apps &
    Websites
    JBoss
    Middleware
    Fuse "memory" across machines into a unified data store
    Read-through, write-through, write-behind
    • NoSQL
    • Extreme Performance
    • Linear Scalability
    • Fault Tolerant
    • Event processing
    • Configurable ACID Txn
    Infinispan
    Databases and/or file system
    Analytical
    Framework

    View Slide

  9. Infinispan
    Use Cases
    Distributed
    Cache
    cache
    frequent
    data
    transient
    short-lived
    storage

    View Slide

  10. Infinispan
    Use Cases
    Distributed
    Cache
    cache
    frequent
    data
    transient
    short-lived
    storage
    NoSQL
    Database
    key/value
    store
    ACID
    transactions

    View Slide

  11. Infinispan
    Use Cases
    Distributed
    Cache
    cache
    frequent
    data
    transient
    short-lived
    storage
    NoSQL
    Database
    key/value
    store
    ACID
    transactions
    Event
    Broker
    listen to
    data
    changes
    continuous
    query

    View Slide

  12. Infinispan
    Use Cases
    Distributed
    Cache
    cache
    frequent
    data
    transient
    short-lived
    storage
    NoSQL
    Database
    key/value
    store
    ACID
    transactions
    Event
    Broker
    listen to
    data
    changes
    continuous
    query
    Data
    Analytics
    map/
    reduce via
    java
    stream
    spark/
    hadoop
    integration

    View Slide

  13. use examples
    • Web, Ecommerce
    • HTTP session
    • Shopping carts
    • Database/legacy offload:
    • Product catalog
    • Caching
    • Telecommunications
    • Cellular billings
    • Call routing, session info,
    • SMS content/notification
    • Travel
    • Aggregated flight
    pricing
    • Availability flights
    • Financial
    • Per-user portfolio data
    and risk analysis
    • Aggregated ticker stream
    • Defence
    • Sensor network data
    process and threat
    detection

    View Slide

  14. Infinispan
    Use Cases
    Distributed
    Cache
    cache
    frequent
    data
    transient
    short-lived
    storage
    NoSQL
    Database
    key/value
    store
    ACID
    transactions
    Event
    Broker
    listen to
    data
    changes
    continuous
    query
    Data
    Analytics
    map/
    reduce via
    java
    stream
    spark/
    hadoop
    integration

    View Slide

  15. Event
    Broker
    listen to
    data
    changes
    continuous
    query

    View Slide

  16. continuous query
    Continuous Query combines
    complex querying with reactive
    data changes

    View Slide

  17. Demo Domain
    Station
    board
    stop
    station
    train

    View Slide

  18. Openshift
    Platform-as-a-Service (PaaS)
    Public or private
    Polyglot
    Based on Docker & Kubernetes

    View Slide

  19. Vert.x
    Tool-kit for building reactive
    applications on the JVM
    Event-Driven & Non-Blocking
    Polyglot

    View Slide

  20. Real-Time Demo
    Continuous Query
    Verticle
    Http App
    Verticle
    Data Grid
    Replication
    Sock JS Bridge
    Real Time Laptop
    Http
    Websockets
    JavaFX
    Injector
    Verticle

    View Slide

  21. DEmo Real-time

    View Slide

  22. Data
    Analytics
    map/
    reduce via
    java
    stream
    spark/
    hadoop
    integration

    View Slide

  23. spark - hadoop
    Powerful analytics APIs
    Combo with Infinispan backend
    Separate process management

    View Slide

  24. distributed java streams
    Extended Java 8 Stream API to
    data stored in

    View Slide

  25. java 8 stream
    List numbers = Arrays.asList(
    4, 74, 20, 97, 118, 50, 97, 34, 48);
    numbers.stream()
    .filter(i -> i > 70)
    // ^ Returns Stream
    .map(n -> new String(Character.toChars(n)))
    // ^ Returns Stream
    .reduce("", String::concat);
    Returns "Java"

    View Slide

  26. Distributed streams
    map(λ)
    λ
    λ

    View Slide

  27. What is the time of the day
    when there is the biggest
    ratio of delayed trains?

    View Slide

  28. Analytics Demo
    Data Grid
    Replication
    Delay Calculator
    Server Task
    Delay Calculator
    Server Task
    Delay Calculator
    Server Task
    Analytics
    Verticle
    Injector
    Verticle
    Analytics
    Jupyter
    Laptop
    HTTP

    View Slide

  29. Demo ANalytics

    View Slide

  30. Build Infinispan based infrastructure
    to store, search and process
    near real-time data and
    calculate analytics
    real-time
    data
    challenge

    View Slide

  31. Build Infinispan based infrastructure
    to store, search and process
    near real-time data and
    calculate analytics
    data
    growth
    problem
    real-time
    data
    challenge

    View Slide

  32. Build Infinispan based infrastructure
    to store, search and process
    near real-time data and
    calculate analytics
    real-time
    data
    challenge
    data
    growth
    problem
    continuous
    query for
    real-time

    View Slide

  33. Build Infinispan based infrastructure
    to store, search and process
    near real-time data and
    calculate analytics
    real-time
    data
    challenge
    data
    growth
    problem
    continuous
    query for
    real-time
    analysis
    with java
    streams

    View Slide

  34. credits
    Approve by Aha-Soft
    from the Noun Project
    engineer by Wilson Joseph
    from the Noun Project
    transformation by Felipe Perucho
    from the Noun Project
    analytics by Roman Kovbasyuk
    from the Noun Project
    Database sharing by YuguDesign
    from the Noun Project
    Server by designify.me
    from the Noun Project

    View Slide

  35. Thanks!
    • github.com/galderz/swiss-transport-datagrid
    • Branch: early17
    • infinispan.org
    • openshift.com
    • vertx.io
    @galderz
    #infinispan

    View Slide