Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Apps with Kafka

Pooja Mistry
September 15, 2021

Scaling Apps with Kafka

Scaling Apps with Kafka

Pooja Mistry

September 15, 2021
Tweet

More Decks by Pooja Mistry

Other Decks in Technology

Transcript

  1. Or how I learned to build a complete
    streaming app with four simple SQL
    statements in ksqlDB.
    Data In Motion

    View Slide

  2. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 2
    “You could not step twice into
    the same river; for other waters
    are ever flowing on to you.
    Unless that water is data and
    the river is Kafka, then, sure.”
    Heraclitus - Probably

    View Slide

  3. View Slide

  4. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    The Rise of Data in Motion
    Data as a continuous stream of events
    80%
    Fortune 100 Companies
    Using Apache Kafka
    4

    View Slide

  5. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Transforming our customers’ apps and data architecture
    Auto / Transport
    Without Event Streaming With Event Streaming
    Batch-driven scheduling Real-time ETA
    Banking Nightly credit-card fraud checks Real-time credit card fraud prevention
    Retail Batch inventory updates Real-time inventory management
    Healthcare Batch claims processing Real-time claims processing
    Media
    Batch data pipelines - production
    supply chain
    Real-time data pipeline
    Manufacturing Scheduled equipment maintenance Automated, predictive maintenance
    Defense Reactive cyber-security forensics Automated SIEM and Anomaly Detection
    U.S. Defense
    Agencies

    View Slide

  6. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Confluent Transforms Data Usage Throughout
    Enterprises
    Retail
    Drive consumer
    analytics & streamline
    operations
    Healthcare
    Provide patients better
    choices & doctors
    better insight
    Capital Markets
    Combat fraud &
    remain competitive
    Automotive
    Amplify vehicle
    intelligence & safety
    Inventory
    Management
    Personalized
    Promotions
    Product
    Development
    & Introduction
    Sentiment
    Analysis
    Streaming
    Enterprise
    Messaging
    Systems of
    Scale for High
    Traffic Periods
    Connected
    Health
    Records
    Data
    Confidentiality
    & Accessibility
    Dynamic Staff
    Allocation
    Optimization
    Integrated
    Treatment
    Proactive
    Patient Care
    Real-Time
    Monitoring
    Capital
    Management
    Early-On
    Fraud
    Detection
    Market Risk
    Recognition &
    Investigation
    Preventive
    Regulatory
    Scanning
    Real-Time
    What-If
    Analysis
    Trade Flow
    Monitoring
    Advanced
    Navigation
    Environmental
    Factor
    Processing
    Fleet
    Management
    Predictive
    Maintenance
    Threat
    Detection &
    Real-Time
    Response
    Traffic
    Distribution
    Optimization
    Common In All
    Industries
    Infrastructure Use
    Cases
    Data Pipelines Messaging
    Microservice/
    Event Sourcing
    Stream
    Processing
    Data
    Integration
    Streaming ETL

    View Slide

  7. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Confluent Customers by Industry
    FINANCIAL SERVICES INSURANCE TECH HEALTHCARE
    COMMUNICATIONS & MEDIA AUTOMOTIVE/TRANSPORTATION CONSUMER/RETAIL TRAVEL

    View Slide

  8. Kafka is powerful … but hard
    Install
    Configure
    Make secure Build apps
    Debug
    Find data
    Get data in/out
    Monitor
    pipelines
    ?
    Upgrade
    Monitor apps
    Alert errors

    View Slide

  9. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Enterprise Data Architecture is a Giant Mess
    LINE OF BUSINESS 01 LINE OF BUSINESS 02 PUBLIC CLOUD

    View Slide

  10. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Service Oriented Architecture

    View Slide

  11. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Service Oriented Architecture
    ?

    View Slide

  12. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Most stream processing architectures are
    complex
    DB CONNECTOR
    CONNECTOR
    APP
    APP
    DB
    STREAM
    PROCESSING
    CONNECTOR APP
    DB

    View Slide

  13. Manage
    Make changes to Kafka
    objects and services and
    see real-time statuses.
    ● Create/edit topics
    ● Change cluster
    settings
    ● Manage connectors
    ● Manage ksqlDB
    Monitor
    See metrics data for Kafka
    and connected services
    over a period of time.
    ● Broker throughput
    ● Topic throughput
    ● Under Replicated
    Partitions
    ● Disk usage over time
    Deploy
    Manage Kafka and
    connected services at
    scale.
    ● Upgrade a cluster
    ● Restart a cluster
    ● Add a new broker

    View Slide

  14. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Confluent Products
    Performance & Elasticity
    Auto Data Balancer | Tiered Storage
    Flexible DevOps Automation
    Operator | Ansible
    GUI-driven Mgmt & Monitoring
    Control Center
    Efficient
    Operations at Scale
    Freedom of Choice
    Committer-driven Expertise
    Event Streaming Database
    ksqlDB
    Rich Pre-built Ecosystem
    Connectors | Hub | Schema Registry
    Multi-language Development
    Non-Java Clients | REST Proxy
    Global Resilience
    Multi-region Clusters | Replicator
    Data Compatibility
    Schema Registry | Schema Validation
    Enterprise-grade Security
    RBAC | Secrets | Audit Logs
    ARCHITECT
    OPERATOR
    DEVELOPER
    Open Source | Community licensed
    Unrestricted
    Developer Productivity
    Production-stage
    Prerequisites
    Fully Managed Cloud Service
    Self-managed Software
    Training Partners
    Enterprise
    Support
    Professional
    Services
    Apache Kafka

    View Slide

  15. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Complete Technology Ecosystem
    15
    Data Diode

    View Slide

  16. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Confluent Delivers A Complete Event Streaming
    Platform
    Apache Kafka®
    Core | Connect API | Streams API
    Performance & Scalability
    Tiered Storage | Self-Balancing Clusters | k8s Operator
    Database
    Changes
    Log Events IoT Data Web Events
    Other
    Events
    DATA
    INTEGRATION
    REAL-TIME
    APPLICATIONS
    Datacenter Public Cloud Confluent Cloud
    Confluent Platform
    Security & Resiliency
    RBAC | Audit Logs | Schema Validation | Multi-Region Clusters | Replicator | Cluster Linking
    Development & Connectivity
    Connectors | Non-Java Clients | REST Proxy | Schema Registry | ksqlDB
    Confluent fully-managed
    Customer self-managed
    Hadoop
    Database
    Data
    Warehouse
    CRM
    Other
    Customer 360
    Fraud Detection
    Inventory
    Management
    Analytics & ML
    Other
    Management & Monitoring
    Control Center | Proactive Support
    COMMUNITY FEATURES
    COMMERCIAL FEATURES
    OPEN SOURCE FEATURES

    View Slide

  17. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Most stream processing architectures are
    complex
    DB CONNECTOR
    CONNECTOR
    APP
    APP
    DB
    STREAM
    PROCESSING
    CONNECTOR APP
    DB

    View Slide

  18. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Most stream processing architectures are
    complex
    DB CONNECTOR
    CONNECTOR
    APP
    APP
    DB
    STREAM
    PROCESSING
    CONNECTOR APP
    DB
    1
    2
    3
    4

    View Slide

  19. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Our unfair advantage
    Confluent
    Processing
    Runtime
    Schema
    Kafka
    Streams
    Confluent
    Schema
    Registry
    Query
    Event
    Capture
    Replication
    Event
    Storage
    Kafka
    Core
    Cluster
    Linking
    Kafka
    Connect
    State
    Stores

    View Slide

  20. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Data in motion with Confluent
    Kafka
    producer/
    consumer
    Kafka
    Streams
    ksqlDB

    View Slide

  21. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Stream processing approach comparison
    Kafka producer/consumer Kafka Streams ksqlDB
    ConsumerRecords records = consumer.poll(100);
    Map counts = new DefaultMapInteger>();
    for (ConsumerRecord record : records) {
    String key = record.key();
    int c = counts.get(key)
    c += record.value()
    counts.put(key, c)
    }
    for (Map.Entry entry : counts.entrySet()) {
    int stateCount;
    int attempts;
    while (attempts++ < MAX_RETRIES) {
    try {
    stateCount = stateStore.getValue(entry.getKey())
    stateStore.setValue(entry.getKey(), entry.getValue() +
    stateCount)
    break;
    } catch (StateStoreException e) {
    RetryUtils.backoff(attempts);
    }
    }
    }
    builder
    .stream("input-stream",
    Consumed.with(Serdes.String(), Serdes.String()))
    .groupBy((key, value) -> value)
    .count()
    .toStream()
    .to("counts", Produced.with(Serdes.String(), Serdes.Long()));
    SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES;

    View Slide

  22. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Stream processing technology organization
    ksqlDB
    Kafka
    producer/consumer
    Kafka Streams
    ksqlDB
    Each layer encapsulates
    and uses the layer
    beneath it

    View Slide

  23. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    An architecture fewer moving parts
    DB
    APP
    APP
    DB
    APP
    PULL
    PUSH
    CONNECTORS
    STREAM PROCESSING
    STATE STORES
    ksqlDB

    View Slide

  24. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    An architecture fewer moving parts
    DB
    APP
    APP
    DB
    APP
    PULL
    PUSH
    CONNECTORS
    STREAM PROCESSING
    STATE STORES
    ksqlDB
    1
    2

    View Slide

  25. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    Build a complete streaming app with 4 SQL
    statements
    Serve lookups against
    materialized views
    Create
    materialized views
    Perform continuous
    transformations
    CREATE SOURCE CONNECTOR jdbcConnector WITH (
    ‘connector.class’ = '...JdbcSourceConnector',
    ‘connection.url’ = '...',
    …);
    CREATE STREAM purchases AS
    SELECT viewtime, userid,pageid,
    TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS')
    FROM pageviews;
    CREATE TABLE orders_by_country AS
    SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total
    FROM purchases
    WINDOW TUMBLING (SIZE 5 MINUTES)
    LEFT JOIN purchases ON purchases.customer_id = user_profiles.customer_id
    GROUP BY country
    EMIT CHANGES;
    SELECT * FROM orders_by_country WHERE country='usa';
    Capture data

    View Slide

  26. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
    DEMO TIME

    View Slide

  27. View Slide