Slide 1

Slide 1 text

1 Apache Kafka™'s Role in Implementing Oracle's Big Data Reference Architecture SUN6259 Oracle OpenWorld, 1 Oct 2017 Robin Moffatt, Partner Technology Evangelist, EMEA t:@rmoff e:[email protected]

Slide 2

Slide 2 text

2 What is Apache Kafka™?

Slide 3

Slide 3 text

3 “Apache Kafka™ is a 
 distributed streaming platform”

Slide 4

Slide 4 text

4 Sounds Fancy 4 Sounds Fancy

Slide 5

Slide 5 text

5 But

Slide 6

Slide 6 text

6 What is Kafka?

Slide 7

Slide 7 text

7 Kafka is a Distributed Streaming Platform Publish and subscribe to streams of data similar to a message queue or enterprise messaging system. Store streams of data in a fault tolerant way. Process streams of data in real time, as they occur. 110101 010111 001101 100010 110101 010111 001101 100010 110101 010111 001101 100010

Slide 8

Slide 8 text

8 Powered by Apache Kafka™

Slide 9

Slide 9 text

9 $ whoami • Partner Technology Evangelist @ Confluent • Working in data & analytics since 2001 • Oracle ACE Director • Blogging : http://rmoff.net & 
 https://www.confluent.io/blog/author/robin/ • Twitter: @rmoff • Geek stuff • Beer & Fried Breakfasts

Slide 10

Slide 10 text

10 The Reference Architecture Or “Information Management and Big Data - A Reference Architecture” for short…

Slide 11

Slide 11 text

11 Information Management and Big Data Reference Architecture • Tool-agnostic logical architecture for Information Management, taking into account Big Data • Written by Oracle, with input from Mark Rittman and Stewart Bryson • Three years old, but still a good starting point for implementation design http://www.oracle.com/technetwork/database/bigdata-appliance/overview/bigdatarefarchitecture-2297765.pdf

Slide 12

Slide 12 text

12 Conceptual Architecture http://www.oracle.com/technetwork/database/bigdata-appliance/overview/bigdatarefarchitecture-2297765.pdf

Slide 13

Slide 13 text

13 Implementation Patterns http://www.oracle.com/technetwork/database/bigdata-appliance/overview/bigdatarefarchitecture-2297765.pdf

Slide 14

Slide 14 text

14 How Do We Build For the Future? http://www.oracle.com/technetwork/database/bigdata-appliance/overview/bigdatarefarchitecture-2297765.pdf

Slide 15

Slide 15 text

15 Kafka in Detail

Slide 16

Slide 16 text

16 What is Kafka? • Messages are stored in Topics • Roughly analogous to a database table • Topics can be partitioned across multiple Kafka nodes for redundancy and performance
 • Not just about streaming - huge uses for data integration too

Slide 17

Slide 17 text

17 What is Kafka? • Kafka makes its data available to any consumer • Security permitting
 • Consumers: - • Are independent from each other • Can be grouped and parallelised for performance and resilience • Can re-read messages as required • Can read messages stream or batch

Slide 18

Slide 18 text

18 Running Kafka • Apache Kafka is open source • Includes Kafka Core, streams processing and data integration capabilities • Can be deployed standalone or as part of Confluent Platform • Also available in most Hadoop distributions, but older versions without latest functionality

Slide 19

Slide 19 text

19 Confluent Platform: Enterprise Streaming based on Apache Kafka™ Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Confluent Platform Apache Kafka™ Core | Connect API | Streams API Data Compatibility Schema Registry Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing Development and Connectivity Clients | Connectors | REST Proxy | KSQL | CLI

Slide 20

Slide 20 text

20 What are the Problems that Kafka Solves?

Slide 21

Slide 21 text

21 What are the Problems That Kafka Solves? Consumer A Producer Kafka

Slide 22

Slide 22 text

22 Multiple Independent Customers of the Same Data Consumer A Producer Kafka Consumer B

Slide 23

Slide 23 text

23 Multiple Parallel Customers of the Same Data Consumer A Producer Kafka Consumer B Consumer A

Slide 24

Slide 24 text

24 Multiple Sources of the Same Type of Data Consumer A Producer Kafka Consumer B Consumer A Producer Kafka

Slide 25

Slide 25 text

25 Scaling Throughput and Resilience Consumer A Producer Consumer B Consumer A Producer Kafka Kafka

Slide 26

Slide 26 text

26 System Availability and Event Buffering Consumer A Producer

Slide 27

Slide 27 text

27 System Availability and Event Buffering Consumer A Producer Kafka

Slide 28

Slide 28 text

28 Varying Latency Requirements / Batch vs Stream Consumer A Producer 24hr batch extract

Slide 29

Slide 29 text

29 Varying Latency Requirements / Batch vs Stream Consumer A Producer 24hr batch extract Consumer B Needs near-realtime / 
 streamed data

Slide 30

Slide 30 text

30 Varying Latency Requirements / Batch vs Stream Consumer A Producer 24hr batch extract Consumer B Now unnecessarily coupled together Still only gets 24hr batch dump of data

Slide 31

Slide 31 text

31 Varying Latency Requirements / Batch vs Stream Consumer A Producer 24hr batch extract Consumer B Hits source system twice Event stream

Slide 32

Slide 32 text

32 Varying Latency Requirements / Batch vs Stream Consumer A Producer 24hr batch extract Consumer B Requires reimplementation of Consumer A Event stream

Slide 33

Slide 33 text

33 Varying Latency Requirements / Batch vs Stream Consumer A Producer Kafka Consumer B Event stream Batch pull Event Stream

Slide 34

Slide 34 text

34 Technology & Code Changes Consumer A (v1) Producer Kafka

Slide 35

Slide 35 text

35 Technology & Code Changes Consumer A (v1) Producer Kafka Consumer A (v2)

Slide 36

Slide 36 text

36 Technology & Code Changes Producer Kafka Consumer A (v2)

Slide 37

Slide 37 text

37 How do I get my data into Kafka?

Slide 38

Slide 38 text

38 Kafka Connect : Stream data in and out of Kafka Amazon S3

Slide 39

Slide 39 text

39 But I need to join… aggregate…filter…

Slide 40

Slide 40 text

40 KSQL: a Streaming SQL Engine for Apache Kafka™ from Confluent • Enables stream processing with zero coding required • The simplest way to process streams of data in real- time • Powered by Kafka: scalable, distributed, battle-tested • All you need is Kafka–No complex deployments of bespoke systems for stream processing

Slide 41

Slide 41 text

41 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; KSQL: the Simplest Way to Do Stream Processing

Slide 42

Slide 42 text

42 Streaming ETL, powered by Apache Kafka and Confluent Platform KSQL

Slide 43

Slide 43 text

43 Reference Architecture - Implementation Patterns http://www.oracle.com/technetwork/database/bigdata-appliance/overview/bigdatarefarchitecture-2297765.pdf

Slide 44

Slide 44 text

44 Building for the Future • Enable flexibility & agility for: • Performance / scaling / resilience • Reducing latency / moving to stream processing • Increasing number of data sources • Connecting other (as yet unknown) consuming applications • Taking advantage of improved technologies (functionality, resilience, cost, scaling, performance)

Slide 45

Slide 45 text

45 Tightly-coupled = Inflexible

Slide 46

Slide 46 text

46 Apache Kafka is the solid foundation upon which you build any successful data platform

Slide 47

Slide 47 text

47 Conceptual Architecture http://www.oracle.com/technetwork/database/bigdata-appliance/overview/bigdatarefarchitecture-2297765.pdf

Slide 48

Slide 48 text

48 Conceptual Architecture - with Kafka as the Backbone Search Replica Graph DB NoSQL

Slide 49

Slide 49 text

49 It's not just about Information Management and Analytics Best Tool -> Best Job

Slide 50

Slide 50 text

50

Slide 51

Slide 51 text

51

Slide 52

Slide 52 text

52 Putting it into Practice

Slide 53

Slide 53 text

53 Kafka for building Streaming Data Pipelines • Source data is online ordering system running on Oracle • Requirement: • Realtime view of key customers logging onto the 
 application • Realtime view of aggregated order 
 counts and values • Long-term storage of data • Populate DW performance layer

Slide 54

Slide 54 text

Streaming ETL with Apache Kafka and Confluent Platform Oracle Oracle GoldenGate for BigData Kafka Connect handler Elasticsearch Kafka Connect KSQL Schema Registry 54 Oracle Hadoop

Slide 55

Slide 55 text

55 Oracle Oracle GoldenGate for BigData Kafka Connect handler Elasticsearch Kafka Connect LOGON-JSON LOGON CUSTOMERS-JSON TOPIC CREATE STREAM LOGON (LOGON_ID INT, …) WITH (kafka_topic='LOGON-JSON', value_format='JSON'); TOPIC STREAM CUSTOMERS TABLE LOGON_ENRICHED STREAM CREATE STREAM LOGON_ENRICHED AS SELECT L.LOGON_ID, C.CUSTOMER_ID… FROM LOGON L LEFT OUTER JOIN CUSTOMERS C ON L.CUSTOMER_ID = C.CUSTOMER_ID; LOGON_ENRICHED TOPIC CREATE TABLE CUSTOMERS (CUSTOMER_ID INT…) WITH (kafka_topic='CUSTOMERS-JSON', value_format='JSON');

Slide 56

Slide 56 text

56 Driving Realtime Analytics with Apache Kafka Events in the source system (Oracle) are streamed in realtime through Kafka, enriched via KSQL, and streamed out through Kafka Connect into Elasticsearch.

Slide 57

Slide 57 text

57 Elasticsearch Kafka Connect order_mode_by_hour TABLE ORDERS STREAM create stream orders (ORDER_DATE STRING … WITH (kafka_topic='ORDERS', value_format='JSON'); create table order_mode_by_hour as select order_mode, count(*) as order_count from orders window tumbling (size 1 hour) group by order_mode; ORDERS order_mode_by_hour TOPIC TOPIC Oracle Oracle GoldenGate for BigData Kafka Connect handler

Slide 58

Slide 58 text

58 Streaming ETL with Apache Kafka and Confluent Platform Amazon S3 Kafka Connect Kafka Connect KSQL Schema Registry Kafka Streams

Slide 59

Slide 59 text

59

Slide 60

Slide 60 text

60

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

62 Apache Kafka's Role in Implementing Oracle's Big Data Reference Architecture SUN6259 Oracle OpenWorld, 1 Oct 2017 Robin Moffatt, Partner Technology Evangelist, EMEA t: @rmoff e: [email protected] https://www.confluent.io/download/ https://speakerdeck.com/rmoff/