Slide 1

Slide 1 text

Revolutionise the Data Lake using Kafka v Sami Alashabi Associate Manager Data & AI - Accenture Solutions Architect - Essent v Flavius Fernandes Technical Capability Lead - Essent

Slide 2

Slide 2 text

Agenda ▸ Introduction ▸ Problem / Result ▸ Architecture ▸ Main Concepts ▸ Key Learnings ▸ Q&A 2

Slide 3

Slide 3 text

Introduction "How can the integration of real-time data streaming help drive data-driven business decisions?" 3

Slide 4

Slide 4 text

Problem 4 ▸ Need for (near) real time analytics ▸ Tedious operations & monitoring ▸ Outdated un-scalable data architecture ▸ Limitations with on-premise CDC solution

Slide 5

Slide 5 text

Result 5 ▸ Ability to react faster to customer feedback ▸ Reduced costs for monitoring & operations ▸ Modern scalable cloud architecture ▸ Centralized Event Bus

Slide 6

Slide 6 text

6 Amazon ECS Kafka Connect Confluent Cloud ksqlDB Landing S3 AWS Lambda (event source) Amazon Aurora Amazon DynamoDB S3 Sink Transit Gateway Schema Registry Snowflake Amazon Glue Amazon Athena Analytics Source Connectors Transit Gateway Apps Amazon ECS Kafka Connect AWS Fargate Real-Time Apps Microservices SAP LT Replication Server ODQ ODP Framework Structure Mapping & Transformation Write Module SAP Extractor Read Module DB Trigger Logging Table Application Table Data stores Delta Lake Databricks Clean S3

Slide 7

Slide 7 text

7 Amazon ECS Kafka Connect Confluent Cloud ksqlDB Landing S3 AWS Lambda (event source) Amazon Aurora Amazon DynamoDB S3 Sink Transit Gateway Schema Registry Snowflake Amazon Glue Amazon Athena Analytics Source Connectors Transit Gateway Apps Amazon ECS Kafka Connect AWS Fargate Real-Time Apps Microservices SAP LT Replication Server ODQ ODP Framework Structure Mapping & Transformation Write Module SAP Extractor Read Module DB Trigger Logging Table Application Table Data stores Delta Lake Databricks Clean S3

Slide 8

Slide 8 text

Main Concepts

Slide 9

Slide 9 text

Apache Kafka 9 Apache Kafka is an open-source distributed streaming system used for stream processing, real-time data pipelines, and data integration at scale. ▸ Brokers ▸ Producers ▸ Consumers ▸ Connectors ▸ Topic

Slide 10

Slide 10 text

Infrastructure as Code 10 IaC allows you to build, change, and manage your infrastructure in a safe, consistent, and repeatable way by defining resource configurations that you can version, reuse, and share. Portability Collaboration Declarative Reusability Consistency Collaboration Declarative Reusability Consistency Portability Declarative Reusability Consistency Portability Collaboration Reusability Consistency Portability Collaboration Declarative Consistency Portability Collaboration Declarative Reusability

Slide 11

Slide 11 text

Event Sourcing 11 ▸ A State-Based system modifies the state of the application in-place using Create, Read, Update & Delete (CRUD). 13:00 13:15 14:00 13:00 14:00 13:01 - 13:59 ▸ An Event-Based Systems models the chronological state changes made by applications as an immutable sequence or “log” of events.

Slide 12

Slide 12 text

Privacy by Design 12 Psuedoanonymisation ▹ replacing any information which could be used to identify an individual with a pseudonym

Slide 13

Slide 13 text

Delta Lake 13 Delta Lake is an open-source storage layer designed to run on top of an existing data lake and improve its reliability, security, and performance. Features: ▸ ACID Transactions ▸ Time Travel ▸ Schema Evolution/Enforcement ENGINE Databricks AWS Glue

Slide 14

Slide 14 text

Key Learnings ▸ Understand Kafka Fundamentals: Before diving into implementation. ▸ Align Use Cases: Identify specific use cases within your organization that can benefit from data streaming. ▸ Plan Infrastructure and Scaling: Design a Kafka cluster architecture that suits your performance, availability, and scalability requirements. ▸ Community and Resources: Leverage the community and available resources along with building an internal community. ▸ Continuous Improvement: Treat Kafka adoption as an ongoing journey 14

Slide 15

Slide 15 text

15 15 THANKS! Any questions? v Flavius Fernandes Technical Capability Lead - Essent v Sami Alashabi Associate Manager Data & AI - Accenture Solutions Architect - Essent