Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

Exactly-Once in Apache Flink

Avatar for Buzzvil Buzzvil
January 05, 2022

Exactly-Once in Apacheย Flink

By Raf

Avatar for Buzzvil

Buzzvil

January 05, 2022
Tweet

More Decks by Buzzvil

Other Decks in Programming

Transcript

  1. Exactly-Once in Apache Flink 1 Exactly-Once in Apache Flink ToC

    Flink exactly-once ๋Š” ์–ด๋–ป๊ฒŒ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์„๊นŒ? w/ message queue Flink ์—์„œ exactly-once ์–ด๋–ป๊ฒŒ ๋ณด์žฅํ•˜๋Š”๊ฐ€? Apache Flink Overview https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/learn-flink/overview/ MapReduce ์˜ ํ™•์žฅํŒฉ(Spark) ์˜ ํ™•์žฅํŒฉ Stateful stream processing MapReduce (2004) https://static.googleusercontent.com/media/research.google.com/ko//archive/mapreduce-osdi04.pdf MapReduce ๊ฐ€ ์ƒ๊ธด ๊ณ„๊ธฐ Over the past five years, the authors and many others at Google have implemented hundreds of special-purpose computations that process large amounts of raw data, such as crawled documents, web request logs, etc., to compute various kinds of derived data, such as inverted indices, various
  2. Exactly-Once in Apache Flink 2 representations of the graph structure

    of web documents, summaries of the number of pages crawled per host, the set of most frequent queries in a given day, etc. Most such computations are conceptually straightforward. However, the input data is usually large and the computations have to be distributed across hundreds or thousands of machines in order to finish in a reasonable amount of time. The issues of how to parallelize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of complex code to deal with these issues. As a reaction to this complexity, we designed a new abstraction that allows us to express the simple computations we were trying to perform but hides the messy details of parallelization, fault-tolerance, data distribution and load balancing in a library. We realized that most of our computations involved applying a map operation to each logical โ€œrecordโ€ in our input in order to compute a set of intermediate key/value pairs, and then applying a reduce operation to all the values that shared the same key, in order to combine the derived data appropriately. Architecture https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/concepts/flink-architecture/
  3. Exactly-Once in Apache Flink 3 Realtime Exactly-Once Ad Event Processing

    at Uber https://eng.uber.com/real-time-exactly-once-ad-event-processing/ ๊ด‘๊ณ ์ฃผ์—๊ฒŒ ์ •ํ™•ํ•œ ๊ด‘๊ณ  ์„ฑ๊ณผ๋ฅผ ๋ณด์—ฌ์ค˜์•ผ ํ•œ๋‹ค. data loss ๋Š” ๊ด‘๊ณ  ์„ฑ๊ณผ๋ฅผ ๋‚ฎ๊ฒŒ ๋ณด์—ฌ์ค€๋‹ค event ๋ฅผ ์ค‘๋ณต์ง‘๊ณ„ํ•ด์„œ๋Š” ์•ˆ๋œ๋‹ค. ๊ด‘๊ณ ์„ฑ๊ณผ๋ฅผ ๋ถ€ํ’€๋ ค์„œ ๋ณด์—ฌ์ฃผ๊ฒŒ ๋œ๋‹ค attribution ๋˜ํ•œ 100% ์ •ํ™•ํ•ด์•ผ ํ•œ๋‹ค
  4. Exactly-Once in Apache Flink 4 Flink ๋Š” exactly-once ๋ฅผ ์ง€์›ํ•˜๊ณ ,

    consumer service ๊ฐ€ read_commited message ๋งŒ ์ฝ์œผ๋ฉด end-to-end exactly-once ์™„์„ฑ attribution ์„ ๋„ฃ์œผ๋ ค๋ฉด data enrichment ๊ฐ€ ํ•„์š”ํ•˜๋ฏ€๋กœ ์™ธ๋ถ€ db ์—์„œ ์ฝ๋Š” ๊ฒฝ์šฐ๋„ ์žˆ์Œ Exactly-Once https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/ Exactly-Once ๊ณ ๋ ค์‚ฌํ•ญ Broker(Kafka) Failure N replication ์„ ์ง€์›ํ•˜๊ณ , replication protocol ์ด exactly-once ๋ฅผ ์ง€์›ํ•˜๋ฏ€๋กœ, N-1 failure ์—์„œ๋„ durable ํ•˜๋‹ค failed put network failure ๋กœ์ธํ•ด broker ๊ฐ€ message ๋ฅผ ๋ชป๋ฐ›์•˜๊ฑฐ๋‚˜, publisher ๊ฐ€ ack ๋ฅผ ๋ชป๋ฐ›์•˜์„๊ฒฝ์šฐ retry ๋กœ ์ธํ•ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค client(producer/consumer) failure producer ์˜ failure ๋Š” message ๊ฐ€ ๋‚ ์•„๊ฐˆ์ˆ˜๋ฐ–์—์—†์ง€๋งŒ, client ๋Š” save point ์—์„œ ๋‹ค์‹œ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค. Exactly-Once in Kafka Idempotence: exactly-once in order semantics per partition producer send operation ์„ idempotent ํ•˜๊ฒŒ ๋งŒ๋“ค์—ˆ์Œ. message ์— sequence number ๋ฅผ ๋‚จ๊ฒจ idempotency ๋ณด์žฅ
  5. Exactly-Once in Apache Flink 5 Transactions: Atomic writes across multiple

    partitions transaction ์„ ํ†ตํ•ด atomicity ์ง€์› Exactly-once stream processing - consumer ?????????? Flink kafka streams API ์—์„œ stream processing ์„ ํ•˜๋ฉด exactly-once ๋ฅผ ์ง€์›ํ•œ๋‹ค - java ํ•˜์ง€๋งŒ go/python ์—์„œ streams API ๋Š” ์—†์Œ Exactly-Once Processing in Flink checkpoint barrier ๋ฅผ datasource channel ์— ์ฃผ์ž… task ๋“ค์€ barrier ๋ฅผ ๋ฐ›์„๋•Œ๋งˆ๋‹ค state backend ์— ํ˜„์žฌ snapshot ์„ ์ €์žฅ sink task ๋Š” Kafka ์— pre-commit
  6. Exactly-Once in Apache Flink 6 ๋ชจ๋“  pre-commit ์ด ์™„๋ฃŒ๋˜๋ฉด jobmanager

    ๋Š” ๋ชจ๋“  task ์—๊ฒŒ pre-commit ์ด ์™„๋ฃŒ๋˜์—ˆ์Œ์„ ์ „๋‹ฌ sink task ๋Š” commit ์„ ํ˜ธ์ถœ ๊ทธ ๋‹ค์Œ์˜ consumer ๋Š” commit ๋œ message ๋ฅผ ์ฝ์–ด๊ฐˆ ์ˆ˜ ์žˆ์Œ Chandy-Lamportโ€™s global snapshot algorithm http://composition.al/blog/2019/04/26/an-example-run-of-the-chandy-lamport-snapshot-algorithm/ distributed system ์—์„œ asynchronous global snapshot ์„ ๋งŒ๋“œ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ snapshot ์„ ์ƒ์„ฑํ•˜๋Š”๋™์•ˆ program ์ด ๋ฉˆ์ถฐ์žˆ์ง€ ์•Š์•„๋„ ๋จ (asynchronous) master node ๊ฐ€ ์—†์Œ - spof ์—์„œ ์ž์œ ๋กœ์›Œ์ง„๋‹ค 1. : ๋…๋ฆฝ์ ์œผ๋กœ ์‹คํ–‰๋˜๋Š” ํ”„๋กœ์„ธ์Šค 2. dot: ์ด๋ฒคํŠธ ๋ฐœ์ƒ 3. ํ™”์‚ดํ‘œ: process ๊ฐ„ ์ด๋ฒคํŠธ ์ „๋‹ฌ 4. : ์—์„œ ๋กœ ์ด๋ฒคํŠธ๋ฅผ ์ „๋‹ฌํ•˜๋Š” ์ฑ„๋„ Pi Cji Pj Pi
  7. Exactly-Once in Apache Flink 7 ์—์„œ B ์ด๋ฒคํŠธ๊ฐ€ ๋๋‚œ ์งํ›„

    ์Šค๋ƒ…์ƒท์„ ์ƒ์„ฑํ•˜๋ผ๊ณ  ์š”์ฒญ barrier(marker message) ๋ฅผ ๋‹ค๋ฅธ process ์—๊ฒŒ ์ „๋‹ฌ ( ์ฃผํ™ฉ์ ์„ ), barrier ๋Š” speical event ๋กœ, snapshot ์— ์ฐํžˆ๋Š” ์ด๋ฒคํŠธ์˜ ๋Œ€์ƒ์€ ์•„ ๋‹ˆ๋‹ค incoming channel , ์— ๋Œ€ํ•ด ๋ ˆ์ฝ”๋”ฉ ์‹œ์ž‘ P1 P1 P1 C21 C31 P3
  8. Exactly-Once in Apache Flink 8 ์€ ๋กœ๋ถ€ํ„ฐ barrier ๋ฅผ ๋ฐ›์•„์„œ

    snapshot ์„ ์บก์ฒ˜ ์ด ํ–ˆ๋˜๊ฒƒ๊ณผ ๋น„์Šท ํ•œ์ผ์„ํ•˜์ง€๋งŒ, ์—์„œ barrier ๋ฅผ ๋ฐ›์•˜์œผ๋ฏ€๋กœ ์ฑ„๋„์„ ๋ ˆ์ฝ”๋”ฉํ•  ํ•„์š”์—†์ด empty ๋กœ ์ €์žฅ ์—๊ฒŒ์„œ barrier ๋ฅผ ๋ฐ›์Œ ์ฑ„๋„ ๋ ˆ์ฝ”๋”ฉ ๋๋‚ด๊ณ  ์ƒํƒœ ์ €์žฅ P3 P1 P1 P1 C13 P1 P3 C31 P2
  9. Exactly-Once in Apache Flink 9 ์—๊ฒŒ์„œ barrier ๋ฐ›์Œ ์—์„œ barrier

    ๋ฐ›์Œ ์—์„œ barrier ๋ฐ›์Œ ์ฑ„๋„์— ๋“ค์–ด์˜จ ์ด๋ฒคํŠธ [H->D] ๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ์ด ์ƒํƒœ๋ฅผ ์ €์žฅ P3 P2 P1 P1 P2 C21
  10. Exactly-Once in Apache Flink 10 ์—์„œ barrier ๋ฐ›์Œ final state

    ์Šค๋ƒ…์ƒท์„ ์ฐ์€ ์‹œ์  ๊ธฐ์ค€์œผ๋กœ happened before event ๊ฐ€ snapshot ์— ํฌํ•จ๋˜๋Š”๊ฒƒ์„ ๋ณด์žฅํ•จ causal consistency eventual consistency << causal consistency << sequential consistency P3 P2
  11. Exactly-Once in Apache Flink 11 Flinkโ€™s checkpointing algorithm Lightweight Asynchronous

    Snapshots for Distributed Dataflows https://arxiv.org/abs/1506.08603 Flink ์˜ dataflow ์™€ chandy-lamport algorithm ์˜ constraint ์™€ ๋‹ค๋ฅธ์  dataflow ๋Š” directed graph ํ˜•ํƒœ, chandy-lamport ๋Š” ๋ชจ๋“  Process ๊ฐ„ ์ž์œ ๋กญ๊ฒŒ ํ†ต์‹  ๊ฐ€๋Šฅํ•˜๋‹ค ๋ฐ”๋€Œ๋Š” ์ œ์•ฝ์กฐ๊ฑด source ๊ฐ€ ๋˜๋Š” task ๋“ค์—๊ฒŒ barrier ๋ฅผ ์ฃผ์ž…ํ•ด์ค˜์•ผ ํ•œ๋‹ค, chandy-lamport ๋Š” ์–ด๋–ค process ์—์„œ๋ถ€ํ„ฐ ์ฃผ์ž…ํ•ด๋„ ๋ฌธ์ œ ์—†์Œ dataflow ๊ฐ€ directed acyclic graph ์ธ ๊ฒฝ์šฐ, ์ฑ„๋„์ด ๋‹จ๋ฐฉํ–ฅ์ด ๋˜๋ฏ€๋กœ ์ฑ„๋„ ๋ ˆ์ฝ”๋”ฉ์ด ํ•„์š”์—†์–ด์ง„๋‹ค cycle ์ด ์ƒ๊ธฐ๋Š”๊ฒฝ์šฐ์— ๋Œ€ํ•œ๊ฒƒ๋„ ์œ„ ๋งํฌ์— ์žˆ๊ธดํ•จ
  12. Exactly-Once in Apache Flink 12 ์š”์•ฝ Flink MapReduce - Spark

    ์˜ ๋‹ค์Œ ์„ธ๋Œ€ (native event stream processing) MapReduce: ๋งŽ์€ ๋ฐ์ดํ„ฐ ํ”„๋กœ์„ธ์‹ฑ์— ๋Œ€ํ•ด infrastructure ๋ฅผ ๋–ผ์–ด๋‚ด๊ณ  library ๋ฅผ ์ œ๊ณตํ•ด์ฃผ์–ด ์‰ฝ๊ฒŒ scale-out ์ด ๊ฐ€๋Šฅํ•œ ์‹œ์Šคํ…œ Spark: Mapreduce ๊ฐ€ disk IO ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋Œ์•„๊ฐ€๋Š” ๊ฒƒ์„ ๊ฐœ์„ ํ•˜์—ฌ in-memory ๋กœ ์ฒ˜๋ฆฌ Flink: Spark ๊ฐ€ stream processing ์ด์ง€๋งŒ micro batch ์ธ ๊ฒฐ์ ์„ ๋ณด์™„ํ•˜์—ฌ ์‹ค์ œ๋กœ event ๋‹จ์œ„๋กœ ๋™์ž‘ํ•˜๋„๋ก ๋งŒ๋“  real-time event processing framework Exacty-Once guarantee ์œ ์ €์—๊ฒŒ ์ •ํ™•ํ•œ ์„ฑ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด์„  exactly-once ๊ฐ€ ํ•„์ˆ˜ ์ •์ƒ ์ƒํ™ฉ์—์„  ๋ฌธ์ œ ์—†๋‹ค, network failure ์—์„œ๋„ idempotent operation ์„ ๋งŒ๋“ค๋ฉด ๋ฌธ์ œ์—†๋‹ค, ํ•˜์ง€๋งŒ failure ์ƒํ™ฉ์—์„œ ์‹œ์Šคํ…œ์ด ์–ด๋–ป๊ฒŒ ๋™์ž‘ํ•˜๋ƒ์— ๋”ฐ๋ผ์„œ ๋ณด์žฅ์—ฌ๋ถ€๊ฐ€ ๊ฒฐ์ •๋œ๋‹ค Flink ์˜ Exactly-Once two-phase commit ์œผ๋กœ ๋™์ž‘ chandy-lamport snapshot based algorithm