Pro Yearly is on sale from $80 to $50! »

Enabling Rapid Business Insight into Data with Stream Analytics and GoldenGate

Enabling Rapid Business Insight into Data with Stream Analytics and GoldenGate

Data streaming is rapidly becoming the norm in modern data architectures. This can be from stream-enabled applications, or through the capabilities of Oracle GoldenGate to stream changes made to the database in realtime to targets including Kafka. This availability of data streams offers great potential and advances in the analytics world, enabling business insight to be realised sooner, and actions taken on the data whilst it is still current. Oracle Stream Analytics (OSA) brings this insight into "Fast Data" to business users through an intuitive web interface. It enables them to filter and analyse data as it arrives, including with predefined algorithms and spatial technology.

In this presentation I will present a live demonstration of how to filter, transform, and analyse streaming data with Oracle Stream Analytics from sources including Kafka. Using Oracle GoldenGate we will see how to stream individual changes from the database made by applications that were not even written with streaming capabilities. How OSA is deployed will be discussed, including its use with Spark as the runtime engine. We will also consider OSA's place in the broader analytics architecture alongside Oracle Data Integrator.

2bded62396ea66c84bd10e91c718dea9?s=128

Robin Moffatt

March 10, 2017
Tweet

Transcript

  1. info@rittmanmead.com www.rittmanmead.com @rittmanmead 1 Enabling Rapid Business Insight into Fast

    Data with Oracle Stream Analytics and Oracle GoldenGate Robin Moffatt, Rittman Mead OUGN 2017 Slides: speakerdeck.com/rmoff/ Twitter: @rmoff
  2. info@rittmanmead.com www.rittmanmead.com @rittmanmead Robin Moffatt 2 • Head of R&D,

    Rittman Mead • Previously OBIEE/DW developer at large UK retailer • Previously SQL Server DBA, Business Objects, 
 DB2, COBOL…. • Oracle ACE • Frequent blogger : http://ritt.md/rmoff and http://rmoff.net • Twitter: @rmoff • IRC: rmoff / #obihackers / freenode
  3. info@rittmanmead.com www.rittmanmead.com @rittmanmead Rittman Mead 3 • Oracle Gold Partner

    with offices in the UK and USA • 70+ staff delivering Oracle BI, DW, Big Data and Advanced Analytics projects • Significant web presence with the Rittman Mead Blog 
 (http://www.rittmanmead.com) • Hadoop R&D lab for “dogfooding” solutions developed for customers
  4. info@rittmanmead.com www.rittmanmead.com @rittmanmead 4 “Stream” Processing…

  5. info@rittmanmead.com www.rittmanmead.com @rittmanmead 5 “Stream” Processing… This uses Environment Agency

    flood and river level data from the real-time data API (Beta)
  6. info@rittmanmead.com www.rittmanmead.com @rittmanmead Oracle Stream Analytics 6 • Stream processing

    runtime with GUI • Previously known as Oracle Stream Explorer - Part of Fusion Middleware • Custom runtime (not WLS) - Future : Spark Streaming
  7. info@rittmanmead.com www.rittmanmead.com @rittmanmead Create Connection 7

  8. info@rittmanmead.com www.rittmanmead.com @rittmanmead Create Stream 8

  9. info@rittmanmead.com www.rittmanmead.com @rittmanmead Exploration 9

  10. info@rittmanmead.com www.rittmanmead.com @rittmanmead Derivations and Calculations 10

  11. info@rittmanmead.com www.rittmanmead.com @rittmanmead Aggregations 11

  12. info@rittmanmead.com www.rittmanmead.com @rittmanmead Patterns 12 • Pre-built templates to create

    Explorations • Simple transformation and aggregation • Built-in algorithms and spatial functionality
  13. info@rittmanmead.com www.rittmanmead.com @rittmanmead Join the Streams (but never cross them…)

    13
  14. info@rittmanmead.com www.rittmanmead.com @rittmanmead References 14

  15. info@rittmanmead.com www.rittmanmead.com @rittmanmead Join the Streams (but never cross them…)

    15
  16. info@rittmanmead.com www.rittmanmead.com @rittmanmead Under the Covers… 16

  17. info@rittmanmead.com www.rittmanmead.com @rittmanmead Join the Streams 17

  18. info@rittmanmead.com www.rittmanmead.com @rittmanmead Spatial Pattern 18 • Uses geofencing to

    overlay stream events and tag them as ‘entering’, ‘leaving’, or ‘staying’ in an area
  19. info@rittmanmead.com www.rittmanmead.com @rittmanmead Map 19

  20. info@rittmanmead.com www.rittmanmead.com @rittmanmead Spatial Pattern 20

  21. info@rittmanmead.com www.rittmanmead.com @rittmanmead Stream Processing Pipeline 21 This uses Environment

    Agency flood and river level data from the real-time data API (Beta)
  22. info@rittmanmead.com www.rittmanmead.com @rittmanmead Your Database Transactions Are Events! Transaction Log

    Kafka Oracle GoldenGate Connector Oracle GoldenGate for Big Data Change Data Capture (CDC) streams every change made to the database into Kafka Data resides in Kafka ready for streaming by one or more consumers Oracle Stream Analytics Each database table is a stream of events in Oracle Stream Analytics
  23. info@rittmanmead.com www.rittmanmead.com @rittmanmead Streaming from OGG to Kafka 23 •

    Oracle GoldenGate for Big Data is required • Set up extract as before • Replicat is defined as KafkaConnect, with JSON encoding - gg.handler.confluent.type=
 oracle.goldengate.kafkaconnect.KafkaConnectHandler - value.converter=
 org.apache.kafka.connect.json.JsonConverter • See https://www.confluent.io/blog/streaming-data-oracle-using-oracle-goldengate-kafka-connect/
  24. info@rittmanmead.com www.rittmanmead.com @rittmanmead Handling Nested JSON 24 • OSA currently

    only works with “flat” JSON • OGG nests the ‘payload’ in a sublevel of the message - e.g. OSA can’t currently reference payload.LOGON_ID
  25. info@rittmanmead.com www.rittmanmead.com @rittmanmead Logstash to the Rescue! 25 • Stream

    processing tool from Elastic • Very easy to configure • Ref: - http://stackoverflow.com/a/ 40131532/350613 - https://github.com/logstash-plugins/ logstash-filter-mutate/issues/90 input { kafka { zk_connect => 'localhost:2181' topic_id => 'ORCL.SOE.ORDERS' } } filter { ruby { code => " event.to_hash.delete_if {|k, v| k != 'payload'} event.to_hash.update(event['payload'].to_hash) event.to_hash.delete_if {|k, v| k == 'payload'} " } } output { kafka { topic_id => "ORCL.SOE.ORDERS_flat" bootstrap_servers => "localhost:9092" }}
  26. info@rittmanmead.com www.rittmanmead.com @rittmanmead Flattened JSON 26

  27. info@rittmanmead.com www.rittmanmead.com @rittmanmead View All Data Changes As They Happen

    27
  28. info@rittmanmead.com www.rittmanmead.com @rittmanmead Filter and Chart Stream Values 28

  29. info@rittmanmead.com www.rittmanmead.com @rittmanmead Stream Aggregates 29

  30. info@rittmanmead.com www.rittmanmead.com @rittmanmead Join Streams 30

  31. info@rittmanmead.com www.rittmanmead.com @rittmanmead Output to Kafka 31 $ kafka-console-consumer \

    --zookeeper localhost:2181 \ -—topic order_aggregates { "ts": 1487253152764000000, "count_30sec": 16, "avg_order_total_30sec": 8138.75, "max_order_total_30sec": 15102 } { "ts": 1487253162686000000, "count_30sec": 17, "avg_order_total_30sec": 5898.5884, "max_order_total_30sec": 15102 }
  32. info@rittmanmead.com www.rittmanmead.com @rittmanmead Stream Processing with Oracle Stream Analytics Transaction

    Log Kafka Oracle GoldenGate Connector Oracle GoldenGate for Big Data Oracle Stream Analytics HDFS connector HDFS/Hive Impala
  33. info@rittmanmead.com www.rittmanmead.com @rittmanmead Tool vs Hand-Coding 33 • Several powerful

    open-source frameworks - e.g. Spark Streaming, Apache Flink, Apache Kafka Streams, Google DataFlow/Apache Beam • Generally requires specialist coders and infrastructure/ops understanding • Parallel arguments with
 ETL tools vs PL/SQL
  34. info@rittmanmead.com www.rittmanmead.com @rittmanmead Tool vs Hand-Coding 34 • Faster to

    prototype and deploy • Accessible by non-coders • Built-in visualisation capabilities • Easier to support • Less flexibility • Slower to adopt new capabilities of streaming frameworks • License costs
  35. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Fault Tolerant, Highly Available, Extreme Streaming – “with Big Data brings massive flows of Data Streams” New Enabling Runtime Infrastructure : Spark Streaming with Kafka Messaging Layer Oracle Stream Analytics – Driving New Innovation }
  36. Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

    | Streaming predictive probability scoring – “I don’t know what I don’t know” New PMML Pattern offering that enables OSA integration with ORE (and SparkML) Oracle Stream Analytics – Driving New Innovation Predictive Analytics Data Warehouse Streaming Probability Scoring Oracle Stream Analytics
  37. info@rittmanmead.com www.rittmanmead.com @rittmanmead EOF 37 email
 robin@rittmanmead.com web
 http://ritt.md/rmoff http://rmoff.net

    twitter
 @rmoff irc
 rmoff @ #obihackers #EOF speakerdeck.com/rmoff/
  38. info@rittmanmead.com www.rittmanmead.com @rittmanmead Reading and References 38 • Simple developer

    download/install process for OSA • Docker image exists, enabling you to very quickly get up and running with OSA, on Mac or Windows too! • https://www.rittmanmead.com/blog/tag/osa/ • https://www.rittmanmead.com/blog/tag/spark-streaming/