Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enabling Rapid Business Insight into Data with Stream Analytics and GoldenGate

Enabling Rapid Business Insight into Data with Stream Analytics and GoldenGate

Data streaming is rapidly becoming the norm in modern data architectures. This can be from stream-enabled applications, or through the capabilities of Oracle GoldenGate to stream changes made to the database in realtime to targets including Kafka. This availability of data streams offers great potential and advances in the analytics world, enabling business insight to be realised sooner, and actions taken on the data whilst it is still current. Oracle Stream Analytics (OSA) brings this insight into "Fast Data" to business users through an intuitive web interface. It enables them to filter and analyse data as it arrives, including with predefined algorithms and spatial technology.

In this presentation I will present a live demonstration of how to filter, transform, and analyse streaming data with Oracle Stream Analytics from sources including Kafka. Using Oracle GoldenGate we will see how to stream individual changes from the database made by applications that were not even written with streaming capabilities. How OSA is deployed will be discussed, including its use with Spark as the runtime engine. We will also consider OSA's place in the broader analytics architecture alongside Oracle Data Integrator.

Robin Moffatt

March 10, 2017
Tweet

More Decks by Robin Moffatt

Other Decks in Technology

Transcript

  1. [email protected] www.rittmanmead.com @rittmanmead 1
    Enabling Rapid Business Insight into
    Fast Data with Oracle Stream Analytics
    and Oracle GoldenGate
    Robin Moffatt, Rittman Mead
    OUGN 2017
    Slides: speakerdeck.com/rmoff/
    Twitter: @rmoff

    View Slide

  2. [email protected] www.rittmanmead.com @rittmanmead
    Robin Moffatt
    2
    • Head of R&D, Rittman Mead

    • Previously OBIEE/DW developer at large UK
    retailer
    • Previously SQL Server DBA, Business Objects, 

    DB2, COBOL….
    • Oracle ACE

    • Frequent blogger : http://ritt.md/rmoff and
    http://rmoff.net

    • Twitter: @rmoff

    • IRC: rmoff / #obihackers / freenode

    View Slide

  3. [email protected] www.rittmanmead.com @rittmanmead
    Rittman Mead
    3
    • Oracle Gold Partner with offices in the
    UK and USA

    • 70+ staff delivering Oracle BI, DW, Big
    Data and Advanced Analytics projects

    • Significant web presence with the
    Rittman Mead Blog 

    (http://www.rittmanmead.com)

    • Hadoop R&D lab for “dogfooding”
    solutions developed for customers

    View Slide

  4. [email protected] www.rittmanmead.com @rittmanmead 4
    “Stream” Processing…

    View Slide

  5. [email protected] www.rittmanmead.com @rittmanmead 5
    “Stream” Processing…
    This uses Environment Agency flood and river level data from the real-time data API (Beta)

    View Slide

  6. [email protected] www.rittmanmead.com @rittmanmead
    Oracle Stream Analytics
    6
    • Stream processing
    runtime with GUI

    • Previously known as
    Oracle Stream Explorer

    - Part of Fusion Middleware
    • Custom runtime (not
    WLS)

    - Future : Spark Streaming

    View Slide

  7. [email protected] www.rittmanmead.com @rittmanmead
    Create Connection
    7

    View Slide

  8. [email protected] www.rittmanmead.com @rittmanmead
    Create Stream
    8

    View Slide

  9. [email protected] www.rittmanmead.com @rittmanmead
    Exploration
    9

    View Slide

  10. [email protected] www.rittmanmead.com @rittmanmead
    Derivations and Calculations
    10

    View Slide

  11. [email protected] www.rittmanmead.com @rittmanmead
    Aggregations
    11

    View Slide

  12. [email protected] www.rittmanmead.com @rittmanmead
    Patterns
    12
    • Pre-built templates to create
    Explorations

    • Simple transformation and
    aggregation

    • Built-in algorithms and
    spatial functionality

    View Slide

  13. [email protected] www.rittmanmead.com @rittmanmead
    Join the Streams (but never cross them…)
    13

    View Slide

  14. [email protected] www.rittmanmead.com @rittmanmead
    References
    14

    View Slide

  15. [email protected] www.rittmanmead.com @rittmanmead
    Join the Streams (but never cross them…)
    15

    View Slide

  16. [email protected] www.rittmanmead.com @rittmanmead
    Under the Covers…
    16

    View Slide

  17. [email protected] www.rittmanmead.com @rittmanmead
    Join the Streams
    17

    View Slide

  18. [email protected] www.rittmanmead.com @rittmanmead
    Spatial Pattern
    18
    • Uses geofencing to overlay stream events and tag them as
    ‘entering’, ‘leaving’, or ‘staying’ in an area

    View Slide

  19. [email protected] www.rittmanmead.com @rittmanmead
    Map
    19

    View Slide

  20. [email protected] www.rittmanmead.com @rittmanmead
    Spatial Pattern
    20

    View Slide

  21. [email protected] www.rittmanmead.com @rittmanmead
    Stream Processing Pipeline
    21
    This uses Environment Agency flood and river level data from the real-time data API (Beta)

    View Slide

  22. [email protected] www.rittmanmead.com @rittmanmead
    Your Database Transactions Are Events!
    Transaction Log Kafka
    Oracle GoldenGate
    Connector
    Oracle GoldenGate
    for Big Data
    Change Data Capture (CDC)
    streams every change made
    to the database into Kafka
    Data resides in
    Kafka ready for
    streaming by one
    or more consumers
    Oracle Stream
    Analytics
    Each database table is a
    stream of events in
    Oracle Stream Analytics

    View Slide

  23. [email protected] www.rittmanmead.com @rittmanmead
    Streaming from OGG to Kafka
    23
    • Oracle GoldenGate for Big Data is required

    • Set up extract as before

    • Replicat is defined as KafkaConnect, with JSON encoding

    - gg.handler.confluent.type=

    oracle.goldengate.kafkaconnect.KafkaConnectHandler
    - value.converter=

    org.apache.kafka.connect.json.JsonConverter
    • See https://www.confluent.io/blog/streaming-data-oracle-using-oracle-goldengate-kafka-connect/

    View Slide

  24. [email protected] www.rittmanmead.com @rittmanmead
    Handling Nested JSON
    24
    • OSA currently only works
    with “flat” JSON

    • OGG nests the ‘payload’ in a
    sublevel of the message

    - e.g. OSA can’t currently
    reference payload.LOGON_ID

    View Slide

  25. [email protected] www.rittmanmead.com @rittmanmead
    Logstash to the Rescue!
    25
    • Stream processing tool
    from Elastic

    • Very easy to configure

    • Ref:

    - http://stackoverflow.com/a/
    40131532/350613
    - https://github.com/logstash-plugins/
    logstash-filter-mutate/issues/90
    input {
    kafka { zk_connect => 'localhost:2181'
    topic_id => 'ORCL.SOE.ORDERS' }
    }
    filter {
    ruby { code => "
    event.to_hash.delete_if {|k, v| k != 'payload'}
    event.to_hash.update(event['payload'].to_hash)
    event.to_hash.delete_if {|k, v| k == 'payload'}
    " }
    }
    output {
    kafka { topic_id => "ORCL.SOE.ORDERS_flat"
    bootstrap_servers => "localhost:9092" }}

    View Slide

  26. [email protected] www.rittmanmead.com @rittmanmead
    Flattened JSON
    26

    View Slide

  27. [email protected] www.rittmanmead.com @rittmanmead
    View All Data Changes As They Happen
    27

    View Slide

  28. [email protected] www.rittmanmead.com @rittmanmead
    Filter and Chart Stream Values
    28

    View Slide

  29. [email protected] www.rittmanmead.com @rittmanmead
    Stream Aggregates
    29

    View Slide

  30. [email protected] www.rittmanmead.com @rittmanmead
    Join Streams
    30

    View Slide

  31. [email protected] www.rittmanmead.com @rittmanmead
    Output to Kafka
    31
    $ kafka-console-consumer \
    --zookeeper localhost:2181 \
    -—topic order_aggregates
    {
    "ts": 1487253152764000000,
    "count_30sec": 16,
    "avg_order_total_30sec": 8138.75,
    "max_order_total_30sec": 15102
    }
    {
    "ts": 1487253162686000000,
    "count_30sec": 17,
    "avg_order_total_30sec": 5898.5884,
    "max_order_total_30sec": 15102
    }

    View Slide

  32. [email protected] www.rittmanmead.com @rittmanmead
    Stream Processing with Oracle Stream Analytics
    Transaction Log Kafka
    Oracle GoldenGate
    Connector
    Oracle GoldenGate
    for Big Data
    Oracle Stream Analytics
    HDFS
    connector
    HDFS/Hive Impala

    View Slide

  33. [email protected] www.rittmanmead.com @rittmanmead
    Tool vs Hand-Coding
    33
    • Several powerful open-source frameworks

    - e.g. Spark Streaming, Apache Flink, Apache Kafka Streams,
    Google DataFlow/Apache Beam
    • Generally requires specialist coders and infrastructure/ops
    understanding

    • Parallel arguments with

    ETL tools vs PL/SQL

    View Slide

  34. [email protected] www.rittmanmead.com @rittmanmead
    Tool vs Hand-Coding
    34
    • Faster to prototype and
    deploy

    • Accessible by non-coders

    • Built-in visualisation
    capabilities

    • Easier to support

    • Less flexibility

    • Slower to adopt new
    capabilities of streaming
    frameworks

    • License costs

    View Slide

  35. Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
    Fault Tolerant, Highly Available, Extreme Streaming – “with Big Data brings massive flows of
    Data Streams” New Enabling Runtime Infrastructure : Spark Streaming with Kafka Messaging Layer
    Oracle Stream Analytics – Driving New Innovation
    }

    View Slide

  36. Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
    Streaming predictive probability scoring – “I don’t know what I don’t know”
    New PMML Pattern offering that enables OSA integration with ORE (and SparkML)
    Oracle Stream Analytics – Driving New Innovation
    Predictive Analytics
    Data Warehouse
    Streaming Probability Scoring
    Oracle Stream Analytics

    View Slide

  37. [email protected] www.rittmanmead.com @rittmanmead
    EOF
    37
    email

    [email protected]
    web

    http://ritt.md/rmoff
    http://rmoff.net
    twitter

    @rmoff
    irc

    rmoff @ #obihackers
    #EOF
    speakerdeck.com/rmoff/

    View Slide

  38. [email protected] www.rittmanmead.com @rittmanmead
    Reading and References
    38
    • Simple developer download/install process for OSA

    • Docker image exists, enabling you to very quickly get up
    and running with OSA, on Mac or Windows too!

    • https://www.rittmanmead.com/blog/tag/osa/

    • https://www.rittmanmead.com/blog/tag/spark-streaming/

    View Slide