Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What's new in Flink 1.16

What's new in Flink 1.16

These are the slides for this recorded talk: https://youtu.be/p0ToGLzhWjU
Full release announcement: https://flink.apache.org/news/2022/10/28/1.16-announcement.html
A short presentation about the new features in Flink 1.16.

Robert Metzger

October 28, 2022
Tweet

More Decks by Robert Metzger

Other Decks in Technology

Transcript

  1. Flink 1.16 in 12 Minutes Robert Metzger, Staff Engineer @

    decodable Apache Flink Committer and PMC Chair
  2. SQL Gateway • In Flink 1.5: SQL Client: Single user

    command line interface • New in Flink 1.16: multi-tenant and pluggable service • Endpoints for REST and HiveServer2 protocol Learn more https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql-gateway/overview/ 1.16 Feature Highlights
  3. Example: Connect to DBeaver with JDBC 1.16 Feature Highlights Learn

    more https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/hive-compatibility/hiveserver2/#dbeaver
  4. Hive Compatibility 1.16 Feature Highlights • New: Hive2 Protocol support

    in SQL Gateway • Improved in 1.16: Hive SQL Syntax ◦ Flink now supports 94% of all for Hive 2.3 test queries • Hive Metastore Learn more https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/hive-compatibility/hive-dialect/overview/
  5. Changelog State Backend Faster Stream Processing • Released as MVP

    in 1.15, production ready in 1.16 ◦ State migration, local recovery, file caching, observability • Why? ◦ Lower latency for transactional sinks ◦ Predictable checkpoint intervals • Results: 664ms vs 6s for a checkpoint (90% of the checkpoints in a 21hrs experiment) • How? Write state updates both to RocksDB and an internal changelog on S3 Learn more https://flink.apache.org/2022/05/30/changelog-state-backend.html
  6. Checkpointing under Back Pressure Faster Stream Processing • Overdraft Buffer

    is an improvement in the network stack ◦ Flink 1.16 adds an additional network buffer pool to unblock tasks suffering from backpressure • Unaligned Checkpoints improvements ◦ Unaligned Checkpoints (since 1.11) improve robustness under backpressure ◦ Unaligned checkpoints are enabled after a certain timeout, however, this timeout is local to each task. Starting from 1.16, if a task detects slow checkpoints, it will enable unaligned checkpoints for upstream operators as well. Further reading: - https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/memory/network_mem_tuning/#ov erdraft-buffers - https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/#unalign ed-checkpointing - https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/
  7. RocksDB Improvements Faster Stream Processing • 1.16 uses new deleteRange

    call to avoid expensive scan-and-delete operations • Benefits for upscaling and restoring ◦ Benchmark (122GB checkpoint): regular delete 18 minutes vs deleteRange 2 minutes • Better observability for RocksDB: internal RocksDB log + metrics (such as total block cache hit/miss) Learn more https://github.com/apache/flink/pull/19033#issuecomment-1072267658
  8. Enhanced Lookup Joins and Async I/O Faster Stream Processing •

    Lookup joins: Call a database / API per record to look up data • New in 1.16 ◦ New backend for caching of lookup call results ◦ Support for asynchronous lookup calls (ALLOW_UNORDERED ) and retries • Existing Async I/O operator also benefits from retry support • Related feature since 1.15 Async Sink: Framework for implementing sinks. Since 1.16 more flexible (configurable rate limiting strategy) Learn more https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+ cache+and+metric
  9. Batch Processing Improvements • Adaptive Hash Join: Gracefully degrades to

    a Sort-Merge-Join • Speculative Execution: Scheduler will create additional execution attempts for slow TaskManagers. First task to finish wins. • Hybrid Shuffle Mode: ◦ Pipelined shuffle = data is streamed downstream, all operators need to be online ◦ Blocking shuffle = intermediate results are persisted on disk, causing sequential execution but better recovery properties. 7% performance improvements in 1.16 ◦ Hybrid shuffle = supports both modes, gives scheduler more freedom • Dynamic Partition Pruning. Before, Flink was only able to prune partitions at optimization time. In 1.16, Flink can prune at runtime. 30% performance improvement Learn more https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode https://cwiki.apache.org/confluence/display/FLINK/FLIP-248%3A+Introduce+dynamic+partition+pruning
  10. Make sure to check out the official release announcement and

    changelog → Links in the video description
  11. Thanks to the 230+ contributors to Flink 1.16 1996fanrui, Ada

    Wang, Ada Wong, Ahmed Hamdy, Aitozi, Alexander Fedulov, Alexander Preuß, Alexander Trushev, Andriy Redko, Anton Kalashnikov, Arvid Heise, Ben Augarten, Benchao Li, BiGsuw, Biao Geng, Bobby Richard, Brayno, CPS794, Cheng Pan, Chengkai Yang, Chesnay Schepler, Danny Cranmer, David N Perkins, Dawid Wysakowicz, Dian Fu, DingGeGe, EchoLee5, Etienne Chauchot, Fabian Paul, Ferenc Csaky, Francesco Guardiani, Gabor Somogyi, Gen Luo, Gyula Fora, Haizhou Zhao, Hangxiang Yu, Hao Wang, Hong Liang Teoh, Hong Teoh, Hongbo Miao, HuangXingBo, Ingo Bürk, Jacky Lau, Jane Chan, Jark Wu, Jay Li, Jia Liu, Jie Wang, Jin, Jing Ge, Jing Zhang, Jingsong Lee, Jinhu Wu, Joe Moser, Joey Pereira, Jun He, JunRuiLee, Juntao Hu, JustDoDT, Kai Chen, Krzysztof Chmielewski, Krzysztof Dziolak, Kyle Dong, LeoZhang, Levani Kokhreidze, Lihe Ma, Lijie Wang, Liu Jiangang, Luning Wang, Marios Trivyzas, Martijn Visser, MartijnVisser, Mason Chen, Matthias Pohl, Metehan Yıldırım, Michael, Mingde Peng, Mingliang Liu, Mulavar, Márton Balassi, Nie yingping, Niklas Semmler, Paul Lam, Paul Lin, Paul Zhang, PengYuan, Piotr Nowojski, Qingsheng Ren, Qishang Zhong, Ran Tao, Robert Metzger, Roc Marshal, Roman Boyko, Roman Khachatryan, Ron, Ron Cohen, Ruanshubin, Rudi Kershaw, Rufus Refactor, Ryan Skraba, Sebastian Mattheis, Sergey, Sergey Nuyanzin, Shengkai, Shubham Bansal, SmirAlex, Smirnov Alexander, SteNicholas, Steven van Rossum, Suhan Mao, Tan Yuxin, Tartarus0zm, TennyZhuang, Terry Wang, Thesharing, Thomas Weise, Timo Walther, Tom, Tony Wei, Weijie Guo, Wencong Liu, WencongLiu, Xintong Song, Xuyang, Yangze Guo, Yi Tang, Yu Chen, Yuan Huang, Yubin Li, Yufan Sheng, Yufei Zhang, Yun Gao, Yun Tang, Yuxin Tan, Zakelly, Zhanghao Chen, Zhu Zhu, Zichen Liu, Zili Sun, acquachen, bgeng777, billyrrr, bzhao, caoyu, chenlei677, chenzihao, chenzihao5, coderap, cphe, davidliu, dependabot[bot], dkkb, dusukang, empcl, eyys, fanrui, fengjiankun, fengli, fredia, gabor.g.somogyi, godfreyhe, gongzhongqiang, harker2015, hongli, huangxingbo, huweihua, jayce, jaydonzhou, jiabao.sun, kevin.cyj, kurt, lidefu, lijiewang.wlj, liliwei, lincoln lee, lincoln.lil, littleeleventhwolf, liufangqi, liujia10, liujiangang, liujingmao, liuyongvs, liuzhuang2017, longwang, lovewin99, luoyuxia, mans2singh, maosuhan, mayue.fight, mayuehappy, nieyingping, pengmide, pengmingde, polaris6, pvary, qinjunjerry, realdengziqi, root, shammon, shihong90, shuiqiangchen, slinkydeveloper, snailHumming, snuyanzin, suxinglee, sxnan, tison, trushev, tsreaper, unknown, wangfeifan, wangyang0918, wangzhiwu, wenbingshen, xiangqiao123, xuyang, yangjf2019, yangjunhan, yangsanity, yangxin, ylchou, yuchengxin, yunfengzhou-hub, yuxia Luo, yuzelin, zhangchaoming, zhangjingcun, zhangmang, zhangzhengqi3, zhaoweinan, zhengyunhong.zyh, zhenyu xing, zhouli, zhuanshenbsj1, zhuzhu.zz, zoucao, zp, 周磊, 饶紫轩,, 鲍健昕 愚鲤, 帝国阿三 We closed 900+ tickets and delivered 19 FLIPs
  12. What is Apache Flink? • Data processing engine for low

    latency, high throughput stream processing • Open source at the Apache Software Foundation, one of the biggest projects there • Wide industry adoption, various hosted services available https://flink.apache.org/poweredby.html