$30 off During Our Annual Pro Sale. View Details »

What's new in Flink 1.16

What's new in Flink 1.16

These are the slides for this recorded talk: https://youtu.be/p0ToGLzhWjU
Full release announcement: https://flink.apache.org/news/2022/10/28/1.16-announcement.html
A short presentation about the new features in Flink 1.16.

Robert Metzger

October 28, 2022
Tweet

More Decks by Robert Metzger

Other Decks in Technology

Transcript

  1. Flink 1.16 in 12
    Minutes
    Robert Metzger, Staff Engineer @ decodable
    Apache Flink Committer and PMC Chair

    View Slide

  2. SQL Gateway
    ● In Flink 1.5: SQL Client: Single user command line interface
    ● New in Flink 1.16: multi-tenant and pluggable service
    ● Endpoints for REST and HiveServer2 protocol
    Learn more https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql-gateway/overview/
    1.16 Feature Highlights

    View Slide

  3. Example: Connect to DBeaver with JDBC
    1.16 Feature Highlights
    Learn more
    https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/hive-compatibility/hiveserver2/#dbeaver

    View Slide

  4. Hive Compatibility
    1.16 Feature Highlights
    ● New: Hive2 Protocol support in SQL Gateway
    ● Improved in 1.16: Hive SQL Syntax
    ○ Flink now supports 94% of all for Hive 2.3 test queries
    ● Hive Metastore
    Learn more
    https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/hive-compatibility/hive-dialect/overview/

    View Slide

  5. Changelog State Backend
    Faster Stream Processing
    ● Released as MVP in 1.15, production ready in 1.16
    ○ State migration, local recovery, file caching, observability
    ● Why?
    ○ Lower latency for transactional sinks
    ○ Predictable checkpoint intervals
    ● Results: 664ms vs 6s for a checkpoint (90% of the checkpoints in a 21hrs experiment)
    ● How? Write state updates both to RocksDB and an internal changelog on S3
    Learn more https://flink.apache.org/2022/05/30/changelog-state-backend.html

    View Slide

  6. Checkpointing under Back Pressure
    Faster Stream Processing
    ● Overdraft Buffer is an improvement in the network stack
    ○ Flink 1.16 adds an additional network buffer pool to unblock tasks suffering from
    backpressure
    ● Unaligned Checkpoints improvements
    ○ Unaligned Checkpoints (since 1.11) improve robustness under backpressure
    ○ Unaligned checkpoints are enabled after a certain timeout, however, this timeout is
    local to each task. Starting from 1.16, if a task detects slow checkpoints, it will enable
    unaligned checkpoints for upstream operators as well.
    Further reading:
    - https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/memory/network_mem_tuning/#ov
    erdraft-buffers
    - https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/#unalign
    ed-checkpointing
    - https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/

    View Slide

  7. RocksDB Improvements
    Faster Stream Processing
    ● 1.16 uses new deleteRange call to avoid expensive scan-and-delete operations
    ● Benefits for upscaling and restoring
    ○ Benchmark (122GB checkpoint): regular delete 18 minutes vs deleteRange 2 minutes
    ● Better observability for RocksDB: internal RocksDB log + metrics (such as total block
    cache hit/miss)
    Learn more https://github.com/apache/flink/pull/19033#issuecomment-1072267658

    View Slide

  8. Enhanced Lookup Joins and Async I/O
    Faster Stream Processing
    ● Lookup joins: Call a database / API per record to look up data
    ● New in 1.16
    ○ New backend for caching of lookup call results
    ○ Support for asynchronous lookup calls (ALLOW_UNORDERED ) and retries
    ● Existing Async I/O operator also benefits from retry support
    ● Related feature since 1.15 Async Sink: Framework for implementing sinks. Since 1.16 more
    flexible (configurable rate limiting strategy)
    Learn more
    https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+
    cache+and+metric

    View Slide

  9. Batch Processing Improvements
    ● Adaptive Hash Join: Gracefully degrades to a Sort-Merge-Join
    ● Speculative Execution: Scheduler will create additional execution attempts for slow
    TaskManagers. First task to finish wins.
    ● Hybrid Shuffle Mode:
    ○ Pipelined shuffle = data is streamed downstream, all operators need to be online
    ○ Blocking shuffle = intermediate results are persisted on disk, causing sequential
    execution but better recovery properties. 7% performance improvements in 1.16
    ○ Hybrid shuffle = supports both modes, gives scheduler more freedom
    ● Dynamic Partition Pruning. Before, Flink was only able to prune partitions at optimization
    time. In 1.16, Flink can prune at runtime. 30% performance improvement
    Learn more https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode
    https://cwiki.apache.org/confluence/display/FLINK/FLIP-248%3A+Introduce+dynamic+partition+pruning

    View Slide

  10. Make sure to check out the official release
    announcement and changelog
    → Links in the video description

    View Slide

  11. Thanks to the 230+ contributors to Flink 1.16
    1996fanrui, Ada Wang, Ada Wong, Ahmed Hamdy, Aitozi, Alexander Fedulov, Alexander Preuß, Alexander Trushev, Andriy Redko, Anton Kalashnikov, Arvid Heise, Ben
    Augarten, Benchao Li, BiGsuw, Biao Geng, Bobby Richard, Brayno, CPS794, Cheng Pan, Chengkai Yang, Chesnay Schepler, Danny Cranmer, David N Perkins, Dawid
    Wysakowicz, Dian Fu, DingGeGe, EchoLee5, Etienne Chauchot, Fabian Paul, Ferenc Csaky, Francesco Guardiani, Gabor Somogyi, Gen Luo, Gyula Fora, Haizhou Zhao,
    Hangxiang Yu, Hao Wang, Hong Liang Teoh, Hong Teoh, Hongbo Miao, HuangXingBo, Ingo Bürk, Jacky Lau, Jane Chan, Jark Wu, Jay Li, Jia Liu, Jie Wang, Jin, Jing Ge,
    Jing Zhang, Jingsong Lee, Jinhu Wu, Joe Moser, Joey Pereira, Jun He, JunRuiLee, Juntao Hu, JustDoDT, Kai Chen, Krzysztof Chmielewski, Krzysztof Dziolak, Kyle Dong,
    LeoZhang, Levani Kokhreidze, Lihe Ma, Lijie Wang, Liu Jiangang, Luning Wang, Marios Trivyzas, Martijn Visser, MartijnVisser, Mason Chen, Matthias Pohl, Metehan
    Yıldırım, Michael, Mingde Peng, Mingliang Liu, Mulavar, Márton Balassi, Nie yingping, Niklas Semmler, Paul Lam, Paul Lin, Paul Zhang, PengYuan, Piotr Nowojski,
    Qingsheng Ren, Qishang Zhong, Ran Tao, Robert Metzger, Roc Marshal, Roman Boyko, Roman Khachatryan, Ron, Ron Cohen, Ruanshubin, Rudi Kershaw, Rufus
    Refactor, Ryan Skraba, Sebastian Mattheis, Sergey, Sergey Nuyanzin, Shengkai, Shubham Bansal, SmirAlex, Smirnov Alexander, SteNicholas, Steven van Rossum, Suhan
    Mao, Tan Yuxin, Tartarus0zm, TennyZhuang, Terry Wang, Thesharing, Thomas Weise, Timo Walther, Tom, Tony Wei, Weijie Guo, Wencong Liu, WencongLiu, Xintong
    Song, Xuyang, Yangze Guo, Yi Tang, Yu Chen, Yuan Huang, Yubin Li, Yufan Sheng, Yufei Zhang, Yun Gao, Yun Tang, Yuxin Tan, Zakelly, Zhanghao Chen, Zhu Zhu, Zichen
    Liu, Zili Sun, acquachen, bgeng777, billyrrr, bzhao, caoyu, chenlei677, chenzihao, chenzihao5, coderap, cphe, davidliu, dependabot[bot], dkkb, dusukang, empcl, eyys,
    fanrui, fengjiankun, fengli, fredia, gabor.g.somogyi, godfreyhe, gongzhongqiang, harker2015, hongli, huangxingbo, huweihua, jayce, jaydonzhou, jiabao.sun, kevin.cyj,
    kurt, lidefu, lijiewang.wlj, liliwei, lincoln lee, lincoln.lil, littleeleventhwolf, liufangqi, liujia10, liujiangang, liujingmao, liuyongvs, liuzhuang2017, longwang, lovewin99,
    luoyuxia, mans2singh, maosuhan, mayue.fight, mayuehappy, nieyingping, pengmide, pengmingde, polaris6, pvary, qinjunjerry, realdengziqi, root, shammon, shihong90,
    shuiqiangchen, slinkydeveloper, snailHumming, snuyanzin, suxinglee, sxnan, tison, trushev, tsreaper, unknown, wangfeifan, wangyang0918, wangzhiwu, wenbingshen,
    xiangqiao123, xuyang, yangjf2019, yangjunhan, yangsanity, yangxin, ylchou, yuchengxin, yunfengzhou-hub, yuxia Luo, yuzelin, zhangchaoming, zhangjingcun,
    zhangmang, zhangzhengqi3, zhaoweinan, zhengyunhong.zyh, zhenyu xing, zhouli, zhuanshenbsj1, zhuzhu.zz, zoucao, zp, 周磊, 饶紫轩,, 鲍健昕 愚鲤, 帝国阿三
    We closed 900+ tickets and delivered 19 FLIPs

    View Slide

  12. 2022
    Build real-time data apps &
    services. Fast.
    decodable.co

    View Slide

  13. What is Apache Flink?
    ● Data processing engine for low latency, high throughput stream processing
    ● Open source at the Apache Software Foundation, one of the biggest projects there
    ● Wide industry adoption, various hosted services available
    https://flink.apache.org/poweredby.html

    View Slide