$30 off During Our Annual Pro Sale. View Details »

The Top 5 Mistakes Deploying Apache Flink

The Top 5 Mistakes Deploying Apache Flink

Learn about the 5 most common mistakes deploying Apache Flink, and how you can avoid them from Flink co-creator and PMC member Robert Metzger.

Robert Metzger

June 25, 2022
Tweet

More Decks by Robert Metzger

Other Decks in Technology

Transcript

  1. The Top 5 Mistakes Deploying
    Apache Flink
    Webinar
    [email protected] @rmetzger_
    [email protected] @esammer
    Robert Metzger
    Decodable
    Eric Sammer
    Decodable

    View Slide

  2. Today’s Webinar
    The Top 5 Mistakes Deploying Apache Flink
    Common Stream Processing Patterns using SQL
    Q&A

    View Slide

  3. Common Flink
    Mistakes
    Robert Metzger
    Staff Engineer @ decodable, Committer and PMC Chair @ Flink

    View Slide

  4. #1 Mistake: Serialization is expensive
    - Mistake: People use Java Maps, Sets etc. to store state or do network
    transfers
    - Serialization happens when
    - transferring data over the network (between TaskManagers or from/to
    Sources/Sinks)
    - accessing state in RocksDB (even in-memory)
    - Sending data between non-chained tasks locally
    - Serialization costs a lot of CPU cycles

    View Slide

  5. co.decodable.talks.flink.performance.Location
    #1 Mistake: Serialization is expensive
    Example:
    package co.decodable.talks.flink.performance;
    private static class Location {
    int lon;
    int lat;
    }
    DataStream> s1 = ...
    2 start co.decodable.talks.flink.performance.Location end
    ~120 bytes
    11 22 88 99
    Map size
    4 bytes
    1st entry
    key
    5 bytes
    1st entry value type
    46 bytes 1st entry
    value
    fields
    8 bytes
    2nd entry
    key fields
    3 bytes
    2n entry value type
    46 bytes
    2nd entry
    value
    fields
    8 bytes
    start
    end
    lon:11 lat:22
    lon:88 lat:99

    View Slide

  6. #1 Mistake: Serialization is expensive
    Example:
    public record OptimizedLocation (int startLon, int startLat, int endLon, int endLat)
    {}
    DataStream< OptimizedLocation > s2 = ...
    11 16 bytes
    → 7.5x reduction in data
    Fewer object allocations = less CPU cycles
    Disclaimer: The actual binary representation used by Kryo might differ, this is for demonstration purposes only
    22 88 99
    Further reading:
    “Flink Serialization Tuning
    Vol. 1: Choosing your
    Serializer — if you can”
    https://flink.apache.org/news/2020/04/15/flink-ser
    ialization-tuning-vol-1.html

    View Slide

  7. #2 Mistake: Flink doesn’t always need to be
    distributed
    - Flink’s MiniCluster allows you to spin up a full-fledged Flink cluster
    with everything known from distributed clusters (Rocksdb,
    checkpointing, the web UI, SQL, …)
    var clusterConfig = new MiniClusterConfiguration.Builder()
    .setNumTaskManagers( 1)
    .setNumSlotsPerTaskManager( 1)
    .build();
    var cluster = new MiniCluster(clusterConfig);
    cluster.start();
    var clusterAddress = cluster.getRestAddress().get();
    var env = new RemoteStreamEnvironment(clusterAddress.getHost(),
    clusterAddress.getPort());

    View Slide

  8. #2 Mistake: Flink doesn’t always need to be
    distributed
    - Use-cases
    - Local debugging and performance profiling: Step through the code as it
    executes, sample most frequently used code paths
    - Testing: make sure your Flink jobs work in end to end tests (together with
    Kafka’s MiniCluster, minio as an S3 replacement). Check out
    https://www.testcontainers.org/
    - Processing small streams efficiently

    View Slide

  9. … unless you have a good reason to do something else.
    - Flink’s deployment options might seem confusing. Here’s a simple framework to think about it:
    - Flink has 3 execution modes
    - Session mode
    - Per-job mode
    - Application Mode (preferred)
    - Flink has 2 deployment models
    - Integrated (active): Native K8s, YARN, (Mesos)
    - Flink requests resources from the resource manager as needed
    - Standalone (passive): well suited for K8s, bare metal, local deployment, DIY
    - Resources are provided to Flink from the outside world
    #3 Advice: Deploy one job per cluster, use
    standalone mode

    View Slide

  10. #3 Execution Modes
    JobManager
    Job1 Job2 Job3
    Session Mode
    Multiple Jobs share a
    JobManager
    JobManager
    Job1
    Application Mode
    One Job per JobManager,
    planned on the JobManager
    JobManager
    Job1
    Per-Job Mode
    One Job per JobManager,
    planned outside the JobManager
    Recommended
    as default

    View Slide

  11. #3 Deployment Options
    Passive Deployment
    Flink resources managed externally
    (“Standalone mode”)
    → “a bunch of JVMs”
    Deployed on bare metal, Docker, Kubernetes
    Pros / Cons:
    + Reactive Mode (“autoscaling”)
    + DIY scenarios
    + Fast deployments
    - Restart
    Active Deployment
    Flink actively manages resources
    → Flink talks to a resource manager
    Implementations: Native Kubernetes,
    YARN
    Pros / cons:
    + Automatically restarts failed resources
    + Allocates only required resources
    - Requires a lot of K8s permissions

    View Slide

  12. #4 Mistake: Inappropriate Cluster sizing
    - Mistake: Under or over-provisioning of clusters for a given workload
    - Understand the amount of data you have incoming and outgoing
    - How much network bandwidth do you have? How much throughput
    does your Kafka have?
    - Understand the amount of state you’ll need in Flink
    - Which state backend do you use?
    - How much memory / disk space do you have (per instance, in your
    cluster) available?
    - How fast is your connection to your state backup (e.g. S3)? This will
    give you a baseline for the checkpointing times

    View Slide

  13. Solution: Proper cluster sizing
    - Do a back of the napkin calculation of your use-case in your environment
    - … assuming normal operation (“baseline”). Include a buffer for spiky loads
    (failure recovery, …)

    View Slide

  14. Example: Proper cluster sizing
    ● Data:
    ○ Message size: 2 KB
    ○ Throughput: 1,000,000 msg/sec
    ○ Distinct keys: 500,000,000
    (aggregation in window: 4 longs per key)
    ○ Checkpoint every minute
    Kafka
    Source
    keyBy
    userId
    Sliding
    Window
    5m size
    1m slide
    Kafka
    Sink
    RocksDB
    ● Hardware:
    ○ 5 machines, each running a TaskManager

    View Slide

  15. Example: A machine’s perspective
    TaskManager n
    Kafka Source
    keyBy
    window
    Kafka Sink
    Kafka: 400 MB/s
    2 KB * 1,000,000 = 2GB/s
    2GB/s / 5 machines = 400 MB/s
    Shuffle: 320 MB/s
    80 MB/s
    Shuffle: 320 MB/s
    400MB/s / 5 receivers =
    80MB/s
    1 receiver is local, 4 remote:
    4 * 80 = 320 MB/s out
    Kafka: 67 MB/s

    View Slide

  16. Excursion: State & Checkpointing
    How much state are we checkpointing?
    per machine: 40 bytes * 5 windows * 100,000,000 keys = 20 GB
    We checkpoint every minute, so: 20 GB / 60 seconds = 333 MB/s
    How is the Window operator accessing state on disk?
    For each key-value access, we need to retrieve 40 bytes from disk, update the
    aggregates and put 40 bytes back
    per machine: 40 bytes * 5 windows * 200,000 msg/sec = 40 MB/s

    View Slide

  17. Example: A machine’s perspective
    TaskManager n
    Kafka Source
    keyBy
    window
    Kafka Sink
    Kafka: 400 MB/s
    Shuffle: 320 MB/s
    80 MB/s
    Shuffle: 320 MB/s Kafka: 67 MB/s
    Checkpoints: 333 MB/s
    Total In: 720 MB/s Total Out: 720 MB/s

    View Slide

  18. Cluster sizing: Conclusion
    - This was just a “back of the napkin” approximation! Real world results will
    differ!
    - Ignored network factors
    - Protocol overheads (Ethernet, IP, TCP, …)
    - RPC (Flink‘s own RPC, Kafka, checkpoint store)
    - Checkpointing causes network bursts
    - A window emission causes bursts
    - Other systems using the network
    - CPU, memory, disk access speed have not been considered

    View Slide

  19. #5 Advice: Ask for Help!
    - Most problems have been solved already online
    - Official, old-school way: [email protected] mailing list
    - Indexed by Google, searchable through https://lists.apache.org/
    - Stack Overflow: the apache-flink tag has 6300 questions!
    - Apache Flink Slack instance
    - Global meetup communities, Flink Forward (w/ training)

    View Slide

  20. Any Flink deployment
    & ops related
    questions?

    View Slide

  21. Get Started with Decodable
    ● Visit http://decodable.co
    ● Start Free http://app.decodable.co
    ● Read the docs http://docs.decodable.co
    ● Watch demos on our YouTube Channel
    ● Join our community Slack channel
    ● Join us for future Demo Days and
    Webinars!

    View Slide

  22. Thank you.

    View Slide

  23. decodable.co 2022
    Build real-time data apps &
    services. Fast.

    View Slide