NTIRE 2021 Learning the Super-Resolution Space Challenge

Slide 1

Slide 1 text

NTIRE 2021 Learning the Super-Resolution Space Challenge Sansan株式会社技術本部 DSOC R&D Automation Group 内⽥奏第七回全⽇本コンピュータビジョン勉強会(後編) @2021/07/31

Slide 2

Slide 2 text

Data Strategy and Operation Center アジェンダ 1. 超解像とは 2. NTIREの概要・歴史 3. NTIRE 2021 Learning the Super-Resolution Space Challenge 1. 問題設定 2. 関連⽂献 4. 結果発表 5. 所感 ※図表は論⽂・発表資料より引⽤しています

Slide 3

Slide 3 text

Data Strategy and Operation Center 超解像とは⼊⼒信号の解像度を⾼めて出⼒する技術 i.e. ⾼解像度化 • 画像以外にも⾳声，電波，センシングの分野でも登場 • ⾼周波成分の復元を指す場合もある超解像画像(SR) 低解像度画像(LR)

Slide 4

Slide 4 text

Data Strategy and Operation Center 問題設定超解像は画像復元問題の⼀種 • 低解像度(LR)画像は⾼解像度(HR)画像が劣化して⽣成されると仮定 • 劣化𝒟の逆変換ℱを求めることが⽬標超解像画像 𝐼!" 低解像度画像 𝐼#" 復元劣化⾼解像度画像 𝐼$" 𝒟 𝐼$" ℱ 𝐼#"

Slide 5

Slide 5 text

Data Strategy and Operation Center NTIREとは CVPR併設の画像復元・強調分野のワークショップ • ETH Zurich Computer Vision Lab が主導 • 関連タスクのコンペティションを同時開催

Slide 6

Slide 6 text

本発表では超解像分野にズームイン!!

Slide 7

Slide 7 text

Data Strategy and Operation Center NTIRE Challengeの歴史 ① 2017, 2018年 • DIV2K [Agustsson+ CVPRW2017] データセットを⽤いてPSNR/SSIMを競う • ネットワーク構造の探索・深層化がメインテーマ > e.g. EDSR [Lim+ CVPRW2017], DBPN [Harris+ CVPR2018] • Perception-Distortion Tradeoff [Blau+ CVPR2018] が提唱，知覚的品質が重要視 EDSRの構造 DBPNの構造 Perception-Distortion Tradeoff

Slide 8

Slide 8 text

Data Strategy and Operation Center NTIRE Challengeの歴史 ② 2019年 • 実応⽤に向けて頑張る潮流が強くなる > 参考: 【Intern CV Report】CVPR2019における超解像 – Sansan Builders Blog • RealSR [Cai+ CVPR2019] データセットを⽤いた倍率不明コンペ • U-shapedなネットワークで，マルチスケールに解くなど学習を⼯夫 > e.g. U-Net+MixUp [Feng+ CVPRW2019] U-shapedなネットワークの例 CutBlur [Yoo+ CVPR2020] へ発展

Slide 9

Slide 9 text

Data Strategy and Operation Center NTIRE Challengeの歴史 ③ 2020年 • Real-World Super-Resolution: Ground-truthが得られない問題設定 > 「iPhoneで撮った画像を拡⼤したい!」 → 対応した⾼解像度画像は存在しない > Noisy LR画像セット & クリーンなHR画像セットが提供 > Kernel estimation, Noise injection を⽤いた⼿法が優勝 [Ji+ CVPRW2020]

Slide 10

Slide 10 text

Data Strategy and Operation Center Real-World Super-Resolution Challengeの結果 ※詳しくは【Zoom or Die】第1回 NTIRE2020 Challenge 結果速報 - Sansan Builders Blog 👑

Slide 11

Slide 11 text

ここから本題

Slide 12

Slide 12 text

GIF animation from https://github.com/andreas128/NTIRE21_Learning_SR_Space

Slide 13

Slide 13 text

Data Strategy and Operation Center Learning the Super-Resolution Space Challenge LR画像対して出⼒可能なSR画像の空間を学習するコンペティション • ill-posed natureをより良く考慮した学習の定式化を⽬指す • 複数の指標でSR Spaceを評価し，相互関係・ベースラインを確⽴ • 制御可能なSR Spaceの探索・結果の修正にも期待 Many-to-oneな縮⼩の逆変換を構築

Slide 14

Slide 14 text

Data Strategy and Operation Center レギュレーション Submission • 1つのLR画像に対して10枚のSR画像(x4, x8)を提出 Rules • モデルから任意枚数をサンプルできること > 枚数に上限があるモデル i.e. 最終層が複数あるみたいなモデルは禁⽌ • シングルモデルであること • Self-ensemble, Test-time augmentationを⾏わないこと • 全てのサンプルは同じハイパーパラメータから出⼒されること • DIV2Kのdata splitを除き，任意の事前学習は可能

Slide 15

Slide 15 text

Data Strategy and Operation Center 評価⽅法 Photo-realism • User-study で Mean Opinion Rank (MOR) を算出 > 各参加者の提出物をランク付し，順位の平均を取った数値 The spanning of the SR Space • 意味的な多様性を持っているかを評価したい (≠ 画素レベルのバラつき) > 必ずしも最⼤化すればいいわけではない • Ground-truthとのLPIPS [Zhang+ CVPR2018] を使って多様性を評価 (下式) Low Resolution Consistency • SR画像をbicubic縮⼩し，LR画像とのPSNRで評価

Slide 16

Slide 16 text

Data Strategy and Operation Center 関連⽂献: SRFlow [Lugmayr+ ECCV2020] Flowベースの超解像⼿法 • 可逆なネットワーク構造を⽤いて潜在変数を学習 > 単⼀のネットワークでエンコード/デコードできる > 対数尤度を直接最適化できる i.e. reparameterization trickなどが必要ない • LR画像で条件付けしたFlowでSR Spaceを学習他⼿法との⽐較 VAEとFlowの⽐較 [Weng 2018]

Slide 17

Slide 17 text

Data Strategy and Operation Center SRFlowの構造

Slide 18

Slide 18 text

Data Strategy and Operation Center 関連⽂献: IRN [Xiao+ ECCV2020] 可逆なネットワークで拡⼤⇆縮⼩をモデリング • 変数分割にHaar Transformationを採⽤ > ⾼周波成分と低周波成分を明⽰的に分離してカップリング • 学習⾃体はreconstruction loss, perceptual loss, JS divergenceを⽤いる > SRFlow は NLL loss のみで学習

Slide 19

Slide 19 text

結果発表

Slide 20

Slide 20 text

Data Strategy and Operation Center 結果発表 (lower is better)

Slide 21

Slide 21 text

Data Strategy and Operation Center 結果発表 (lower is better) Flowベースの⼿法が有⼒

Slide 22

Slide 22 text

Data Strategy and Operation Center 定性的⽐較 👑 Deterministic GANベース Flowベース

Slide 23

Slide 23 text

Data Strategy and Operation Center サンプル間の⽐較テクスチャの多様性を確認

Slide 24

Slide 24 text

Data Strategy and Operation Center サンプル数による多様性の変化 x4 x8 サンプル数を増やすと多様性指標が向上 (どこかに収束しそう)

Slide 25

Slide 25 text

Data Strategy and Operation Center Winner solution [Kim+ CVPRW2021] SRFlowのConditional Flow StepにNoise Condition Layerを挿⼊ • 学習時に⼊⼒画像にノイズ付与 & リサイズしたノイズマップをLayerに⼊⼒ • 多様性の向上に寄与⼊⼒にノイズを⼊れて学習した結果 →Noise Condition Layerで対処ネットワーク構造

Slide 26

Slide 26 text

Data Strategy and Operation Center njtech&seu (x4 2nd, x8 6th) Low Resolution EncoderにTransformerを導⼊ • Image Processing Transformer (IPT) [Chen+ CVPR2021] に着想?

Slide 27

Slide 27 text

Data Strategy and Operation Center 所感 Flow-based methods は強い • 応⽤⼀辺倒な分野だったが，理論的に踏み込める⽷⼝かも • SRFlowが強すぎてコンペ参加者は⼿のひらで転がされてる感どこまでが Super-Resolution / Up-sampling? • 映っている物体を「実直に」拡⼤している印象を与える > ⾼倍率だとデータセットのバイアスが強く影響 e.g. PULSE [Menon+ CVPR2020] > Conditional Image Generation under LR-constraints の⽅が誤解がないかも • SR Space の広さについて研究するのが重要 > 学習した空間内で発⽣する意味的な変化に制約をかける等 > タスクごとに許容できる倍率・多様性について議論が必要

Slide 28

Slide 28 text

Data Strategy and Operation Center 参考⽂献 [Wang 2018] L. Weng, “Flow-based deep generative models,” Oct. 13, 2018. https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html (accessed Jul. 31, 2021). [Zhang+ CVPR2018] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595. [Ji+ CVPR2020] X. Ji, Y. Cao, Y. Tai, C. Wang, J. Li, and F. Huang, “Real-world super-resolution via kernel estimation and noise injection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 466–467. [Yoo+ CVPR2020] J. Yoo, N. Ahn, and K.-A. Sohn, “Rethinking data augmentation for image super-resolution: A comprehensive analysis and a new strategy,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8375–8384. [Feng+ CVPRW2019] R. Feng, J. Gu, Y. Qiao, and C. Dong, “Suppressing model overfitting for image super-resolution networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0. [Blau+ CVPR2018] Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6228–6237. [Harris+ CVPR2018] M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection networks for super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1664–1673.

Slide 29

Slide 29 text

Data Strategy and Operation Center 参考⽂献 [Lim+ CVORW2017] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144. [Agstsson+ CVPRW2017] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126–135. [Lugmayr+ CVPRW2021] A. Lugmayr, M. Danelljan, and R. Timofte, “NTIRE 2021 learning the super-resolution space challenge,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 596–612. [Kim+ CVPR2021] Y. Kim and D. Son, “Noise Conditional Flow Model for Learning the Super-Resolution Space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 424–432. [Lugmayr+ ECCV2020] A. Lugmayr, M. Danelljan, L. Van Gool, and R. Timofte, “SRFlow: Learning the Super-Resolution Space with Normalizing Flow,” in Computer Vision – ECCV 2020, 2020, pp. 715–732. [Chen+ CVPR2021] H. Chen et al., “Pre-trained image processing transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12299–12310. [Xiao+ ECCV2020] M. Xiao et al., “Invertible Image Rescaling,” in Computer Vision – ECCV 2020, 2020, pp. 126–144. [Menon + CVPR2020] S. Menon, A. Damian, S. Hu, N. Ravi, and C. Rudin, “Pulse: Self-supervised photo upsampling via latent space exploration of generative models,” in Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2020, pp. 2437–2445.

Slide 30

Slide 30 text

No content