Speaker Deck

Kafka's Role in Implementing Oracle's Big Data Reference Architecture

by rmoff

Published March 1, 2017 in Technology

As presented at YoDB #7. Any successful Big Data platform needs to be built around solid architecture and resilient flexibility to changing data usage and sources. The great value of enterprise data is in being able to access it all on demand, in raw form or enriched, including for uses that may not even have been conceived yet. Apache Kafka, when implemented as part of Oracle's Information Management and Big Data Conceptual Architecture, can play a crucial role in enabling this.

Apache Kafka streams data from a source system once, and then made available to as many end-consumers as require it - in batch or realtime, with each user of it consuming it at its own rate. It also persists the data in guaranteed sequence, meaning that it can be replayed on demand. This enables applications and microservices to extract source data as required, making it powerful in supporting both the Data Factory as well as the Discovery Lab concepts of the architecture.

In this presentation we'll introduce the basics of Kafka, and explain how it fits with the overall Big Data Architecture. From here we'll look at how it can be used with Oracle GoldenGate and other sources to form the initial touch point for streaming data into the data reservoir on the Big Data Appliance. We will then look at how flexible Kafka as a pipeline is for extending the architecture to multiple discovery and sandbox environments with ad hoc and repeatable feeds into tools such as Elasticsearch and Flume.