Slide 1

Slide 1 text

Advanced Topics of Prior-based Image Restoration Tensors and Neural Networks Tatsuya Yokota Tutorial in APSIPA-ASC 2021

Slide 2

Slide 2 text

2 Organization of This Tutorial Basics & Typical methods Tensors Neural Networks This tutorial consists of 3 parts. Q&A and breaks are inserted between each part. You can actively interrupt me for asking any question!!

Slide 3

Slide 3 text

3 3 Introduction What is image restoration problem Basics & Typical methods

Slide 4

Slide 4 text

4 Quiz What’s in the blank? ?

Slide 5

Slide 5 text

5 Quiz What’s in the blank? ➔ How is it?

Slide 6

Slide 6 text

6 Quiz What’s in the blank? ➔ How is it?

Slide 7

Slide 7 text

7 Quiz What’s in the blank? ➔ How is it?

Slide 8

Slide 8 text

8 Quiz What’s in the blank? ➔ How is it?

Slide 9

Slide 9 text

9 All candidates can be solution without prior ! Quiz Candidates Signal A Signal B Signal C Signal D … … Prior information plays a role in selecting a solution from candidates - Example 1: Smoothness prior - This may select Signal A - Example 2: Periodicity prior - This may select Signal D PRIOR IS ALL YOU NEED !

Slide 10

Slide 10 text

10 Image restoration problems Typical image restoration problems Image denoising Image deblurring Image inpainting Image super-resolution

Slide 11

Slide 11 text

11 Vector representation of an image Basically, we represent an image as a vector in many cases Consider making an array of all pixel values into a vector 1 2 3 . . . . . . . . . . N . . . . . . . . . . . . 1 2 3 . . . . . . . . . . N .... For example, a gray-scale image of size 100 × 100 ➔ 10000-dimensional vector 100 100 10000

Slide 12

Slide 12 text

12 Matrix / Tensor representation of an image We sometimes represent an image as a matrix / tensor Directly consider an image as multi-dimensional array Matrix representation 100 100 100 100 255 255 255 250 220 142 32 0 0 38 168 240 255 255 255 255 255 255 248 202 94 0 0 0 0 0 44 202 254 255 255 255 255 250 202 70 0 0 0 104 90 0 0 170 252 255 255 255 255 232 120 0 0 36 162 220 144 0 0 166 250 255 255 255 255 226 98 0 48 184 240 232 110 0 4 188 252 255 255 255 255 244 180 112 180 242 252 200 40 0 96 226 255 255 255 255 255 255 242 226 242 254 240 142 0 0 172 248 255 255 255 255 255 255 255 255 255 254 208 56 0 82 220 254 255 255 255 255 255 255 255 255 255 248 164 0 0 154 244 255 255 255 255 255 255 255 255 255 254 226 98 0 52 206 254 255 255 255 255 255 255 255 255 255 246 172 8 0 144 240 255 255 255 255 255 255 255 255 255 254 218 90 0 68 210 252 255 255 255 255 255 255 255 255 255 236 144 0 0 140 226 238 238 234 228 240 254 255 255 255 244 170 6 0 0 86 116 120 118 112 122 192 244 255 255 254 208 56 0 0 0 0 0 0 0 0 46 168 240 255 255 252 186 24 0 0 2 38 58 76 110 134 174 224 252 255 Tensor representation Hyper-spectoral tensor RGB image MRI

Slide 13

Slide 13 text

13 Image denoising Observation model of noisy image = Noisy image Clean image + Noise = + known unknown unknown estimate

Slide 14

Slide 14 text

14 Image inpainting Observation model of incomplete image = Incomplete image Mask image Clean image = known unknown Hadamard (entry-wise) product known estimate

Slide 15

Slide 15 text

15 Image deblurring Observation model of blurred image = Blurred image Blur kernel Clean image = known unknown known estimate

Slide 16

Slide 16 text

16 Image super-resolution Observation model of low-resolution image = Low resolution image Down sampling of high resolution image = known unknown known estimate

Slide 17

Slide 17 text

17 Unified formulation of observation model = + = = = Noise Missing Blur Down sample = + General known unknown known unknown

Slide 18

Slide 18 text

18 Non-uniqueness of the solution without prior Goal of image restoration Without prior, solution is not unique in many cases. = + known unknown known unknown estimate #Equations #Parameters is not injective or

Slide 19

Slide 19 text

19 Motivation of prior Equation has no unique solution. Priors make solution space smaller or unique. In practice, we consider the following optimization problem. : penalty function for noise prior : penalty function for image prior solution space Priors

Slide 20

Slide 20 text

20 Motivation of prior (cont’d) For example, Unique!!

Slide 21

Slide 21 text

21 Interpretation based on MAP estimation Maximum A Posteriori (MAP) estimation

Slide 22

Slide 22 text

22 Image prior Smoothness: (Total variation) Non-negativity Graph regularization A class of image reconstruction methods Many image reconstruction methods can be produced from above framework When both and are convex, it can be solved by convex optimizers: Alternating direction methods of multiplier (ADMM), Primal-dual hybrid gradient (PDHG) Noise prior Gaussian noise Laplacian noise Poisson noise ×

Slide 23

Slide 23 text

23 Variants of image priors In this tutorial, I want to discuss the image priors Image priors can be characterized as a combinations of two elements Representation vector matrix tensor graph function in Fourier space in Wavelet coefficient space in other linear transforms Properties etc sparsity group sparsity low-rank orthogonality smoothness non-negativity self-similarity ×

Slide 24

Slide 24 text

24 24 Sparseness Prior Basics & Typical methods

Slide 25

Slide 25 text

25 Sparsity Sparsity plays very important roles in data science Noise reduction in signal/image processing Variable selection in machine learning Meaning of sparse: “there are fewer non-zero entries in a vector” 1 0 0 0 3 0 0 2 sparse 1 2 5 4 3 2 1 2 dense

Slide 26

Slide 26 text

26 Measures of sparsity The simplest penalty measure of the sparsity is number of non-zero entries It is called as “L0-norm” Its convex relaxation is “L1-norm” Both L0-norm and L1-norm penalty prefer sparsity 1 0 0 0 3 0 0 2 1 2 5 4 3 2 1 2 sparse dense

Slide 27

Slide 27 text

27 Optimization Now, we consider a given vector to be sparse. Following two sparsity inducing operators can be obtained L0-norm minimization L1-norm minimization Hard thresholding Soft thresholding input output input output

Slide 28

Slide 28 text

28 Sparsity in images Is the image data sparse? Intensity of image is not generally sparse… Histogram of intensity

Slide 29

Slide 29 text

29 Sparsity of images under linear transform The image itself is not sparse But it is known that the image is sparse under some linear transformations Block-wise DCT Wavelet Transform Differential

Slide 30

Slide 30 text

30 Optimization An image restoration problem with sparse penalty under linear transform For example, it can be solved by ADMM (alternating direction methods of multiplier) equivalent problem augmented Lagrangian alternating update

Slide 31

Slide 31 text

31 Optimization An image restoration problem with sparse penalty under linear transform For example, it can be solved by ADMM (alternating direction methods of multiplier) equivalent problem augmented Lagrangian alternating update

Slide 32

Slide 32 text

32 Demo Various values of in sparse regularization under linear transform Differential Block- wise DCT Wavelet (Haar)

Slide 33

Slide 33 text

33 Demo Noisy(PSNR=22.10) Missing (50%) Block-wise DCT Wavelet Transform Differential PSNR=27.36 PSNR=25.54 PSNR=25.81 PSNR=24.35 PSNR=23.95 PSNR=26.50

Slide 34

Slide 34 text

34 Demo Blur + noise Low resolution (1/4) Block-wise DCT Wavelet Transform Differential PSNR=22.88 PSNR=23.04 PSNR=23.23 PSNR=23.61 PSNR=22.77 PSNR=23.87

Slide 35

Slide 35 text

35 35 Smoothness Prior Basics & Typical methods

Slide 36

Slide 36 text

36 Smoothness This is more intuitive nature for images Smooth Non-smooth

Slide 37

Slide 37 text

37 Smoothness for one-dimensional signals Penalty function for smoothness It must prefer smooth signals and hate non-smooth signals Sum of length of difference QV (Quadratic Variation) for one dimensional signal TV (Total Variation) for one dimensional signal smooth not smooth 1 1 1 1 1 1 1 1 4 3 5 2 0 5 4 1

Slide 38

Slide 38 text

38 Image as multi-dimensional function (or scalar field) Smoothness is evaluated by a sum of length of gradient field (vector field) Smoothness for multi-dimensional signals

Slide 39

Slide 39 text

39 Smoothing is the minimization of gradient strength It can be seen as block sparseness of images under linear transform Normal sparse is “there are fewer non-zero entries in a vector” Block sparse is “there are fewer non-zero vectors in a matrix” For example, Block sparseness under linear transform zero vectors non-zero vectors

Slide 40

Slide 40 text

40 QV regularization for images Gradient vectors are obtained by linear transformation of an image QV regularization problem is a minimization of quadratic function

Slide 41

Slide 41 text

41 TV regularization for images Furthermore, we consider convex conjugate of l2-norm For example, it can be solved by Primal-Dual Hybrid Gradient (PDHG) Equivalent min-max problem alternating update

Slide 42

Slide 42 text

42 TV regularization for images Furthermore, we consider convex conjugate of l2-norm For example, it can be solved by Primal-Dual Hybrid Gradient (PDHG) Equivalent min-max problem alternating update

Slide 43

Slide 43 text

43 Demo Various values of QV TV

Slide 44

Slide 44 text

44 Demo Noisy(PSNR=22.10) Missing (50%) TV QV PSNR=26.29 PSNR=27.28 PSNR=26.73 PSNR=25.74

Slide 45

Slide 45 text

45 Demo Blur + noise Low resolution (1/4) TV QV PSNR=25.97 PSNR=24.30 PSNR=23.11 PSNR=23.14

Slide 46

Slide 46 text

46 46 Non-local Similarity Prior Basics & Typical methods

Slide 47

Slide 47 text

47 Non-local similarity / self-similarity Non-local similarity is a nature of images of “There are many similar patches (blocks) in a single image” Many images are constructed by planes, lines, and textures. It applies to both images of natural and artificial objects. similar

Slide 48

Slide 48 text

48 Typical methods Non-local means (NLM) filter, BM3D are typical methods for image denoising using non-local similarity prior of images. In roughly , NLM/BM3D applies the following processes for all patches: Select a reference patch from the image Extract a similar patches from the image by template matching Denoise the reference patch by weighted average of similar patches The denoised patch is returned to original position in the image ※ This is not strict explanation, but we use it for easy or intuitive understanding. ※ reference patch similar patches by template matching weighted average or filter

Slide 49

Slide 49 text

49 Actually, non-local filter is very simple (it is just a linear transform) The weight matrix is important. represents the weight of connection between i-th and j-th pixels. For weighted mean must be satisfied. Calculation method of determines the nature of the filter For example, it is calculated by i-th and j-th patches Mathematical formulation of non-local filter i-th patch j-th patch normalization constant

Slide 50

Slide 50 text

50 For a simple example, we consider one-dimensional time-series Two filters are prepared: Local filter vs non-local filter Original Non-local filter constructed by noisy signal Local filter (moving average) Noisy High frequency components disappear Noise is reduced successfully

Slide 51

Slide 51 text

51 Graph perspective of non-local filter Weight matrix represents graphical structure of pixels Node : pixel Edge : similarity between pixels In a sense, non-local is a re-definition of locality based on the graph local connection 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 non-local connection Isotropic diffusion An-isotropic diffusion

Slide 52

Slide 52 text

52 Demo Iteration of non-local filter Iterations=100 Iterations=1000 Iterations=10000

Slide 53

Slide 53 text

53 Demo Comparison with TV regularization in denoising It can not directly apply other tasks because template matching is required. Noisy(PSNR=22.10) TV NLM PSNR=27.28 PSNR=27.01

Slide 54

Slide 54 text

54 Image restoration problem is formulated. Classical prior based methods are introduced Sparseness Hard / Soft-thresholding Sparse regularization under linear transform Smoothness Quadratic variation Total variation Non-local similarity Non-local filter Summary of Part I = + known unknown known unknown estimate

Slide 55

Slide 55 text

55 55 What and Why Tensors? Tensors

Slide 56

Slide 56 text

56 Variants of image priors Image priors can be characterized as a combinations of two elements From now, we focus on matrix and tensor representation and its low-rank approximation Representation vector matrix tensor graph function in Fourier space in Wavelet coefficient space in other linear transforms Properties etc sparsity group sparsity low-rank orthogonality independency non-negativity self-similarity ×

Slide 57

Slide 57 text

57 Matrices / Tensors for Image Restoration Overview concept of this part There are various low-rank models for tensors Matrix factorization Nuclear norm regularization CP decomposition, Tucker decomposition TT decomposition, TR decomposition etc = Tensor representation Low-rank approximation

Slide 58

Slide 58 text

58 Recommended Book “Tensors for Data Processing: Theory, Methods, and Applications” Edited by Prof. Yipeng Liu Recommended Chapter

Slide 59

Slide 59 text

59 Variables having indices Generalization of scalars, vectors, and matrices Ex: Number of indices : “Order” Scalar : 0th order tensor Vector : 1st order tensor Matrices : 2nd order tensor … Tensors

Slide 60

Slide 60 text

60 Multi-way array 0th order tensors : 1st order tensors : 2nd order tensors : 3rd order tensors : Tensor data Vector → array of scalars Matrix → array of vectors 3rd order Tensor → array of matrices

Slide 61

Slide 61 text

61 Let us imagine 3rd order tensor by a “cube” 4th order tensor A block 3rd order tensor of which elements are vectors A block matrix of which elements are matrices A block vector of which elements are 3rd order tensors 5th order tensor A block 4th order tensor of which elements are vectors A block 3rd order tensor of which elements are matrices A block matrix of which elements are 3rd order tensors A block vector of which elements are 4th order tensors Nth order tensor A block vector of which elements are (N-1)th order tensors Recursive representations of Nth order tensor ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ ・・・ ・・・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ ・・・ ・・・ ・ ・ ・ Block matrix Block matrix Block vector Block vector

Slide 62

Slide 62 text

62 Psychometrics (1960s-) Linguistics (1970s-) Chemometrics (1980s-) Food industry (1990s-) Computer Science / Data Engineering (1990s-) Multi-media signal processing (audio, speech, image, video) Bio-medical signal processing (EEG, MEG, functional MRI, diffusion MRI) Bio-informatics (gene, protein) Remote sensing (Hyperspectral image) Web/data mining (SNS analysis, recommender system) Non-linear time series analysis Wireless communications Deep learning etc Fields of applications ID 001 ID 002 ID 003 ID 004

Slide 63

Slide 63 text

63 63 Matrix representation and rank Tensors

Slide 64

Slide 64 text

64 Matrix / Tensor representation of an image We consider an image as a matrix Observation model is written by 100 100 100 100 255 255 255 250 220 142 32 0 0 38 168 240 255 255 255 255 255 255 248 202 94 0 0 0 0 0 44 202 254 255 255 255 255 250 202 70 0 0 0 104 90 0 0 170 252 255 255 255 255 232 120 0 0 36 162 220 144 0 0 166 250 255 255 255 255 226 98 0 48 184 240 232 110 0 4 188 252 255 255 255 255 244 180 112 180 242 252 200 40 0 96 226 255 255 255 255 255 255 242 226 242 254 240 142 0 0 172 248 255 255 255 255 255 255 255 255 255 254 208 56 0 82 220 254 255 255 255 255 255 255 255 255 255 248 164 0 0 154 244 255 255 255 255 255 255 255 255 255 254 226 98 0 52 206 254 255 255 255 255 255 255 255 255 255 246 172 8 0 144 240 255 255 255 255 255 255 255 255 255 254 218 90 0 68 210 252 255 255 255 255 255 255 255 255 255 236 144 0 0 140 226 238 238 234 228 240 254 255 255 255 244 170 6 0 0 86 116 120 118 112 122 192 244 255 255 254 208 56 0 0 0 0 0 0 0 0 46 168 240 255 255 252 186 24 0 0 2 38 58 76 110 134 174 224 252 255

Slide 65

Slide 65 text

65 Matrix rank Rank-1 matrix A matrix obtained by outer-product of two any non-zero vectors Matrix rank The minimum number of rank 1 matrices required to make for any  All columns have same direction  All rows have same direction

Slide 66

Slide 66 text

66 2 An example of matrix rank Both matrices are constructed by same entries but rank are different 3 2 1 6 4 2 9 6 3 3 2 1 6 3 4 6 9 3 2 1 1 2 3 × 0 0 1 1 4 9 × 2 3 6 1 1 0 × + rank is 1 rank is 2

Slide 67

Slide 67 text

67 SVD: Singular Value Decomposition SVD is a fundamental method for matrix analysis Matrix rank can be obtained by number of non-zero singular values = Left singular vectors Left singular vectors Diagonal matrix of singular values

Slide 68

Slide 68 text

68 68 Low-rank prior for matrices Tensors

Slide 69

Slide 69 text

69 Low-rank nature of images Singular values of an image Low rank approximations using truncated SVD Singular values Rank 1 Rank 5 Rank 10 Rank 30 Rank 50

Slide 70

Slide 70 text

70 Low-rank nature of images (cont’d) Similarity between fibers is implied Spatial continuity is ignored Matrix rank has invariance in switching columns or rows Similar Similar ≈ ≈ Switch R Rank is R Rank is R R R R R R Rank is R ≈

Slide 71

Slide 71 text

71 Formulation We consider the following low-rank optimization Convex relaxation Equivalent often used It is referred as trace norm or nuclear norm or Schatten-1 norm Equivalent

Slide 72

Slide 72 text

72 We define proximal operator of nuclear norm This operation is known as singular value shresholding It can be regarded as soft thresholding of singular values Singular value shresholding Use SVD of Y Soft thresholding input output Solution is given by

Slide 73

Slide 73 text

73 Low-rank image restoration Nuclear norm regularization Solution algorithm: ADMM equivalent problem augmented Lagrangian alternating update

Slide 74

Slide 74 text

74 Low-rank image restoration Nuclear norm regularization Solution algorithm: ADMM equivalent problem augmented Lagrangian alternating update

Slide 75

Slide 75 text

75 Demo Various values of 0.01 0.25 1.0 10 25

Slide 76

Slide 76 text

76 Demo Noisy(PSNR=22.10) PSNR=24.10 Missing (50%) Blur + noise Low resolution (1/4) PSNR=23.04 PSNR=24.12 PSNR=22.02

Slide 77

Slide 77 text

77 77 Low-rank prior for tensors Tensors

Slide 78

Slide 78 text

78 Tensor rank vs Matrix rank Unlike matrix ranks, tensor ranks have many variants Tensor ranks are based on its corresponding decomposition models Decomposition Rank CP decomposition (CPD) CP rank Tucker decomposition (TKD) Tucker rank Tensor train decomposition (TTD) TT rank Tensor ring decomposition (TRD) TR rank

Slide 79

Slide 79 text

79 Tensor decomposition is a higher order generalization of matrix decomposition General definition may be “a tensor represented by multiple tensors” A specific decomposition is given by a parametric model f and its parameter tensors. f( , , ) Tensor decompositions = Matrix decomposition = core tensor factor matrix

Slide 80

Slide 80 text

80 Tensor based image restoration There are many studies of low-rank based tensor data recovery Low-n-rank tensor recovery [Gandy+, 2011 (in Iverse Problems)] Low rank tensor completion [Liu+, 2012 (in IEEE-TPAMI)] Bayesian CP decomposition [Zhao+, 2015 (in IEEE-TPAMI)] t-SVD [Zhang+, 2017 (in IEEE-TSP)] Low-rank and smooth tensor recovery [Yokota+, 2017 (in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Tensor ring completion [Wang+, 2017 (in ICCV)] Note that most of them are focusing on image inpainting (tensor completion) Therefore, we introduce on image inpainting methods in this part

Slide 81

Slide 81 text

81 Estimation of missing values, it can be also regarded as regression Image inpainting (revisit) Incomplete Tensor Complete Tensor x y z v 1 1 1 1.2 2 1 1 2.1 3 1 1 ? 1 2 1 ? 2 2 1 1.5 3 2 1 ? … … … … … … … … Nx Ny Nz 3.1 Learn tensor by parametric model x y z v 1 1 1 1.2 2 1 1 2.1 3 1 1 1.5 1 2 1 3.1 2 2 1 1.5 3 2 1 2.5 … … … … … … … … Nx Ny Nz 3.1

Slide 82

Slide 82 text

82 Two approaches for tensor completion Rank minimization by using penalty function (nuclear norms) When is convex, it can be solved by convex optimizer such as ADMM, PDS. Obtaining explicit low-rank tensor decompositions Incomplete tensor Projection operator Set of observed entries Entry of : Set of core tensor and factor matrices : Penalty for core tensor and factor matrices such as orthogonality, sparseness, smoothness

Slide 83

Slide 83 text

83 83 Low-rank prior for tensors Approach using penalty function Tensors

Slide 84

Slide 84 text

84 Tensor ranks Two typical tensor ranks CP rank: R Minimum number of rank-1 tensors to construct a target tensor Multi-linear tensor rank (Tucker rank): (R1, R2, R3) Ranks of matrices obtained by individual mode unfolding of a target tensor The 1st mode rank The 2nd mode rank The 3rd mode rank 1st mode 2nd mode 3rd mode = rank-1 tensor rank-1 tensor rank-1 tensor R target tensor target tensor R1 R2 R3

Slide 85

Slide 85 text

85 Two definitions of tensor nuclear norms Tensor nuclear norm based on CP decomposition (CPD) Tensor nuclear norm based on Tucker decomposition (TKD) Tensor nuclear norms Since obtaining CPD is NP-hard, it is not useful. n-th mode unfolding It is convex function, and quite useful.

Slide 86

Slide 86 text

86 Formulation [Gandy+, 2011 (in Inverse Problems)] [Liu+, 2012 (in IEEE-TPAMI)] It can be solved by ADMM LRTC: Low rank tensor completion equivalent problem augmented Lagrangian alternating update 1st mode 2nd mode 3rd mode target tensor Projection for X Singular value thresholding for Y Update for Z Indicator function

Slide 87

Slide 87 text

87 tSVD: tensor SVD Tensor SVD [Zhang+, 2017 (in IEEE-TSP)] It is a convolutional tensor decomposition model based on t-product It can be obtained by sequential steps = t-product DFT along the 3rd mode SVD of each frontal slices Inverse DFT

Slide 88

Slide 88 text

88 tSVD: tensor SVD (cont’d) TNN: Tensor nuclear norm [Zhang+, 2017 (in IEEE-TSP)] A tensor nuclear norm based on tSVD Tubal nuclear norm is same as a matrix nuclear norm of its block circulant one. = Block circulant matrix Block diagonalization DFT blcirc

Slide 89

Slide 89 text

89 tSVD: tensor SVD (cont’d) Singular value thresholding for tSVD It can be obtained by the following steps: DFT → SVD → Soft-thresh → inverse DFT DFT along the 3rd mode SVD of each frontal slices Inverse DFT Soft thresholding = =

Slide 90

Slide 90 text

90 Formulation It can be solved by ADMM tSVD: tensor SVD (cont’d) equivalent problem augmented Lagrangian alternating update : complement of

Slide 91

Slide 91 text

91 TKD and tSVD based tensor nuclear norm TKD based nuclear norm tSVD based nuclear norm Comparison between TKD and tSVD SVD SVD

Slide 92

Slide 92 text

92 Demo Various values of in singular value thresholding 0.1 1.0 10 25 100 TKD tSVD

Slide 93

Slide 93 text

93 Demo Color image inpainting (tensor completion) tSVD based nuclear norm TKD based nuclear norm matrix nuclear norm (for each color frame) PSNR = 23.20 PSNR = 25.74 PSNR = 26.66

Slide 94

Slide 94 text

94 94 Low-rank prior for tensors Approach using explicit decompositions Tensors

Slide 95

Slide 95 text

95 TD models Canonical Polyadic (CPD) Tucker (TKD) Tensor Train (TT) Tensor Ring (TR) Tensor decomposition (TD) approach Unified notation of a tensor decomposition (TD) Reconstruction tensor is parameterized by

Slide 96

Slide 96 text

96 Unified notation of a tensor decomposition In general, cost function is non-convex with respect to But, it is convex with respect to each ALS can be applied to wide range of TD models (CPD, TKD, TT, and TR) Alternating least squares (ALS) for TD …

Slide 97

Slide 97 text

97 Cost function for TD based tensor completion Majorization-Minimization (MM) algorithm can be applied The following auxiliary function is employed It satisfies , Update by (ALS can be applied!!) We have TD for Tensor Completion

Slide 98

Slide 98 text

98 Rank increment algorithm Rank determination (size of core tensor) is a critical problem in TD Rank increment strategy works well!! Start from R=1, optimization, RR+1, optimization, RR+1, …, until converge RR+1 Rank increment Optimize

Slide 99

Slide 99 text

99 Demo Color image inpainting (tensor completion) CPD with rank increment TKD with rank increment PSNR = 23.09 PSNR = 26.02 Incomplete

Slide 100

Slide 100 text

100 100 Low-rank + smoothness Tensors

Slide 101

Slide 101 text

101 Various priors can be combined at the same time Penalty function based approach TD based approach Low-rank, smoothness, sparseness, non-negativity, etc Typical combinations is Low-rank and Smoothness Multiple priors

Slide 102

Slide 102 text

102 LRTV: Low-rank and TV minimization LRTV prior [Yokota+, 2017 (in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Smoothness (TV) Low-rank (TKD based nuclear norm) Range constraint (0~255) Noise inequality Primal-Dual Splitting (PDS) Algorithm

Slide 103

Slide 103 text

103 SPC: Smooth CPD based tensor completion Smooth CPD [Yokota+, 2016 (in IEEE-TSP)] = ・・・ height width rgb Consistent + + ○ ○ ○ ・・・ + + + + = smooth components Smoothness (QV) for factor matrix CPD model Square errors MM algorithm ALS algorithm Rank increment

Slide 104

Slide 104 text

104 Demo Combinations of low-rank and smoothness priors Smooth CPD (PSNR=28.29) Normal CPD (PSNR=26.02) LRTV (PSNR=26.86) Only LR (PSNR=25.74) Only TV (PSNR=26.21) tSVD (PSNR=26.66) Penalty Function Tensor Decomp.

Slide 105

Slide 105 text

105 105 Low-rank prior under tensorization Tensors

Slide 106

Slide 106 text

106 Limitation of direct representation Revisiting signal restoration using tensor decomposition When whole slices are missing in a tensor, low-rank tensor decomposition does not work Random pixel missing can be recovered, but slice can not be recovered. Since slice missing deflates the matrix/tensor rank, its deflated component can not be recovered. Slices & random missing Recovered by low-rank decomposition

Slide 107

Slide 107 text

107 Tensorization for TD We consider a concept of “tensorize” and “decompose” A question here is “how to tensorize the signals” Generally, some specific knowledge on the signals is necessary to design the tensors Here, I introduce the Direct folding Ket augmentation Multi-way Hankelization signals tensors decomposition

Slide 108

Slide 108 text

108 Direct folding of images Image data can be naturally represented by matrix, but folding have some benefits for compression (256,256) Low rank approximation #param = 2048 #param = 2048 16th order tensor (2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) Low rank approximation Folding

Slide 109

Slide 109 text

109 Ket augmentation (KA) KA [Bengua+, 2017 (in IEEE-TIP)] (256,256) #param = 2048 #param = 2048 (2,2,2,2,2,2,2,2) (2,2,2,2,2,2,2,2) 16th order tensor 8th order tensor Folding KA (4,4,4,4,4,4,4,4) Low rank approximation Low rank approximation

Slide 110

Slide 110 text

110 Approach to use Hankel representation Here, we introduce to use low-rank tensor decomposition not in direct tensor space, but in embedded (Hankel) tensor space. height width rgb Low-rank tensor completion Delay embedding / Hankelization High order incomplete tensor Low-rank tensor completion Inverse operation output High order completed tensor  ☺

Slide 111

Slide 111 text

111 Here, I introduce a technique from dynamical system analysis. For a time series: for We consider a state space : for It is called as “Delay-embedding (DE)” What is delay-embedding (DE)

Slide 112

Slide 112 text

112 DE is a transformation from a vector to a Hankel matrix. (i.e. Hankelization) Hankel matrix : a matrix of which anti-diagonal elements are the same Delay-embedding (DE) is Hankelization

Slide 113

Slide 113 text

113 DE can be regarded as a linear operation. DE = duplication & unfolding

Slide 114

Slide 114 text

114 Inverse of delay-embedding (DE) Can we define the inverse of DE? There is no inversion of DE since it is essentially a duplication We only consider the pseudo-inverse of DE folding unfolding duplicate … average … average duplicate …  ☺ ?

Slide 115

Slide 115 text

115 SVD of Hankel matrix Seeing singular values of Hankel matrix with various value of Hankel matrix has a low-rank structure 3d-plot of DE(x,τ=3) and its principal patterns Order of singular values

Slide 116

Slide 116 text

116 SVD of Hankel matrix (cont’d) Seeing left- and right- singular vectors of Hankel matrix Left singular vectors are very similar to discrete cosine transform (DCT) basis

Slide 117

Slide 117 text

117 Time-series recovery by using truncated SVD missing&noisy D.E. missing&noisy Hankel matrix low rank approximation Inverse D.E. recovered Completion

Slide 118

Slide 118 text

118 Considering “both” case for matrix delay-embedding Each column is Hankelized by DE It obtains time-series of Hankel matrices Further DE obtains “block Hankel matrix” Block Hankel matrix is defined as a block matrix which has Hankel structure and its all elements are also Hankel matrices Block Hankel structure … … … … … … … … … … DE DE a b c d z A B C D A B C B C D C D E Z Z

Slide 119

Slide 119 text

119 DE = linear duplication + folding Multiway delay-embedding = multilinear duplication + folding We call it as “Multiway delay-embedding transform (MDT)” MDT: A generalization of DE for tensors = I τ*(I-τ+1) I-τ+1 τ = 3rd order tensor (I1, I2, I3) 6th order tensor

Slide 120

Slide 120 text

120 DE vs Inverse DE MDT vs Inverse MDT Inverse MDT 6th order tensor † † † = 3rd order tensor (I1, I2, I3) 6th order tensor (I1, I2, I3) 3rd order tensor

Slide 121

Slide 121 text

121 MDT of Image and its high order SVD Seeing the components of Hankel tensor [Yokota+, 2018 (in APSIPA)] Very similar to DCT Bases

Slide 122

Slide 122 text

122 Application to Tensor Completion Incomplete tensor Incomplete block Hankel tensor Complete block Hankel tensor Complete tensor Low-rank Tucker decomposition in embedded space [Yokota+, 2018 (in CVPR)] ① MDT of given incomplete tensor of original size (Nth order ➔ 2Nth order) ② Tucker decomposition of incomplete block Hankel tensor ③ Inverse MDT of comlete block Hankel tensor (2Nth order ➔ Nth order) MDT Inv. MDT Tucker Decomp.

Slide 123

Slide 123 text

123 Tucker decomposition of complete tensor was explained before Here, I explain the method for Tucker decomposition of incomplete tensor Optimization problem can be given by It can be solved by auxiliary function approach and ALS algorithm Update Tucker decomposition of an incomplete tensor : Incomplete tensor : binary tensor of same size to → 0 represents missing entry → 1 represents observed entry ALS to minimize alternative optimization To convergence ↑ it is same as Tucker for complete tensor

Slide 124

Slide 124 text

124 (256,256,3) (32,225,32,225,3) (R1,R2,R3,R4,3) (256,256,3) Experimental results (32,32) → (R1,R3) (225,225) → (R2,R4) 16 32 64 4 8 Slice missing was recovered with appropriate rank decomposition ➔ But the best rank is unknown

Slide 125

Slide 125 text

125 Adaptive rank increment algorithm To be free from the rank estimation problem in tensor completion, we propose to use rank increment algorithm

Slide 126

Slide 126 text

126 ・・・ ・・・ Initialization is very important !! (32,225,32,225,3) Experimental results

Slide 127

Slide 127 text

127 Demo Low-rank decomposition under tensorization CPD with Rank inc. 99% missing TKD with Rank inc. Raw tensor (256,256,3) Tensorization (8,8,4,8,8,4,3) KA (16,16,16,16,3) MDT (32,225,32,225,3)

Slide 128

Slide 128 text

128 Demo Comparison with various methods 50% missing tSVD LRTV SPC MDT-TKD 70% missing 90% missing 99% missing LR TV CPD TKD 26.67 21.71 17.10 9.96 25.86 21.98 16.92 7.88 26.21 23.17 19.32 13.97 26.89 23.69 19.53 13.97 26.18 22.05 17.05 6.06 23.50 20.27 16.44 10.60 28.32 25.50 21.46 15.91 23.50 23.44 20.28 16.59

Slide 129

Slide 129 text

129 Summary of Part II Prior based image restoration under matrix and tensor representation Penalty function based approach Matrix nuclear norm works well for low-rank prior Several nuclear norm for tensors TKD based nuclear norm tSVD based nuclear norm Their problem can be solved by ADMM or PDS TD based approach There are many TD models, TD is solved by ALS Tensor completion can be solved by MM Rank increment works well for adaptive rank selection Low-rank prior under tensorization is hot now.

Slide 130

Slide 130 text

130 130 Basics of neural networks and recent advances Neural Networks

Slide 131

Slide 131 text

131 Neural Networks Mathematical model of a function that outputs a signal obtained by repeatedly applying linear operations and nonlinear activations to an input signal. Various functions can be expressed by adjusting the weight matrix used for the linear operation of each layer as a parameter. Universal approximation ability A two-layer NN consisting of an infinite number of nodes can approximate any function Deep structure It is good for approximating complicated functions It helps supervised and unsupervised learning using large amount of data Neural Networks and Machine Learning

Slide 132

Slide 132 text

132 Assumption: There is a large amount of pairs of input x and output y (training data) Learning process Step 1: Design mathematical model Step 2: Fit the mathematical model to training data Trained model is obtained as follow Supervised learning 〇 : training samples

Slide 133

Slide 133 text

133 Supervised learning: example Image Recognition Targeted Function When an image x is inputted, then class y is outputted: y=f(x) “DOG” “CAT” input output

Slide 134

Slide 134 text

134 Supervised learning: example Age estimation Targeted System When a facial image x is inputted, then estimated age y is outputted: y=f(x) 7 years old 45 years old input output

Slide 135

Slide 135 text

135 Supervised learning: The impact of deep learning Large Scale Visual Recognition Challenge 2012 A huge database of images: ImageNet Train about 1.2 million image data Compete for 1000 class classification tasks Deep learning Univ. Toronto

Slide 136

Slide 136 text

136 Convolutional Neural Networks (CNNs) CNN at the time of 2012 consists of 8 layers CNN at the time of 2015 consists of 152 layers!! ※below figure is 34 layers

Slide 137

Slide 137 text

137 Unsupervised learning Variational Auto-Encoder (VAE) [Kingma+, 2013] Generate arbitrary images which included in a specific domain Generated facial images by learned model Generated hand-written Digits by learned model

Slide 138

Slide 138 text

138 Unsupervised learning (cont’d) Variational Auto-Encoder (VAE) [Kingma+, 2013] Images are mapped to the appropriate coordinates in the latent space. Due to the low dimensionality of the latent space, similar images are closely located. Continuous changes of images on the latent space can be seen ・・・ ・・・ Latent space Encoder Decoder

Slide 139

Slide 139 text

139 Unsupervised learning (cont’d) Generative Adversarial Nets (GAN) [Goodfellow+, 2014] It performs adversarial training with a generator and a discriminator. Generator: Optimize to generate fake images so that it can deceive the discriminator Discriminator: Optimize to accurately classify fake images and real images Latent space Normal distribution Discriminator Generator (Decoder) Fake Real Real or Fake It is like a game of AI vs AI

Slide 140

Slide 140 text

140 Unsupervised learning (cont’d) StyleGAN [Karras+, 2019] [demo] [github] Very large GAN with various improvements Learned 70,000 images with a size of 1024 x 1024 StyleGAN

Slide 141

Slide 141 text

141 141 Data-driven priors Neural Networks

Slide 142

Slide 142 text

142 Machine Learning (ML) framework Learning a function that inputs corrupted image and outputs clean image Corrupted images are created by a pseudo manner from clean image database Optimize parametric function to fit the training data Corrupted image Clean image

Slide 143

Slide 143 text

143 ML based Super-Resolution SRCNN [Dong+, ECCV2014], [Dong+, TPAMI2015], SRResNet [Ledig+, CVPR2017] Create many low-resolution and high-resolution image pairs and learn the input / output relationship with a CNN High resolution image Low resolution image

Slide 144

Slide 144 text

144 ML based Super-Resolution SRGAN [Ledig+, CVPR2017] Extension of SRCNN, SRResNet with adversarial regularization SR image (fake) LR image {True, Fake} HR image (real) Content loss (VGG) Adversarial regularizer - G deceives D - D keeps accuracy Content loss for SRResNet

Slide 145

Slide 145 text

145 PULSE [Menon+, CVPR2020] It explicitly use data-driven prior for image restoration Training phase: Learn generative model from set of clean images Image restoration based on data-driven prior: ML based Super-Resolution Data-driven prior sphere surface Error bound

Slide 146

Slide 146 text

146 ML based Super-Resolution PULSE [Menon+, CVPR2020] PULSE with StyleGAN StyleGAN [Karras+, CVPR2019] PULSE [Menon+, CVPR2020]

Slide 147

Slide 147 text

147 Discussions about the dada-driven prior Almost all dataset have bias Skin color, hair color, hairstyle, glasses, gender, age, clothes, accessories, etc StyleGAN can generate high quality HR images However, these generated images are still in some specific domain (not universal) original Down scaled Upscaled by PULSE ※Demo code is available at [github]

Slide 148

Slide 148 text

148 148 Deep Image Prior Neural Networks

Slide 149

Slide 149 text

149 Deep Image Prior [Ulyanov+, CVPR2018] No training data ! Only CNN structure becomes image prior ! Deep Image Prior Minimize error mask Corrupted image output Input (noise) Randomly initialized CNN

Slide 150

Slide 150 text

150 Deep Image Prior: Optimization behavior Deep Image Prior [Ulyanov+, CVPR2018] Prepare CNN with randomly initialized weights Update weights by minimizing least squares error

Slide 151

Slide 151 text

151 Deep Image Prior: Noise Reduction Early stopping is necessary for noise reduction Noisy image 0 iterations 100 iterations 1200 iterations 1800 iterations

Slide 152

Slide 152 text

152 Deep Image Prior: PET reconstruction In PET reconstruction, we minimize KL divergence with Radon transform Observed sinogram Noise (fixed) optimize Reconstructed PET Reconstructed sinogram Evaluate by KL div. Iter=10 SN=1.8 Iter=100 SN=12.2 Iter=600 SN=16.2 Iter=2600 SN=18.8 Iter=10000 SN=10.2 Iter=5000 SN=17.4 T. Yokota+, “Dynamic PET image reconstruction using NMF incorporated with Deep Image Prior." ICCV, 2019.

Slide 153

Slide 153 text

153 Why noise can be reduced by CNNs ? Explanation in [Ulyanov+, 2018] Noise impedance was shown Four targets were tested to reconstruct Natural image Natural image with noise Natural image with pixel shuffle Uniform noise Convergence speeds were different Natural image was faster Noise was slower DIP exploits the delay of times However, it is just a report of phenomenon Question: Why CNNs? Why convolution?

Slide 154

Slide 154 text

154 154 Patch manifold model Neural Networks

Slide 155

Slide 155 text

155 Discussions Deep Image Prior Fully Convolutional Net (architecture) itself has some image prior However, the prior is difficult to be explained with words MDT × Low-rank model Convolution=DE + linear transform It has some relationship with CNN

Slide 156

Slide 156 text

156 Discussions (cont’d) Multiway delay-embedding transform (MDT) is a patch extractor

Slide 157

Slide 157 text

157 Discussions (cont’d) Multiway delay-embedding transform (MDT) is a patch extractor It produces Multilevel Block Hankel Matrix (256, 256) (32, 225, 32, 225) (32*32, 225*225) (32, 32, 225, 225) MDT mode permutation & reshaping … 32 225 225 32 Two level block Hankel matrix

Slide 158

Slide 158 text

158 Discussions (cont’d) Multiway delay-embedding transform (MDT) is a patch extractor It produces Multilevel Block Hankel Matrix Its linear transform is equivalent to convolution Conv = MDT + Linear CNN = (Conv + Act)*L MLP = (Linear + Act)*L MLP is explainable as manifold learning Here, we consider MDT + MLP … * = = MDT reshape Input Convolution Activation Convolution Activation … Output Convolution Convolution Activation CNN structure

Slide 159

Slide 159 text

159 Manifold Modeling in Embedded Space (MMES) A new CNN structure using MDT [Yokota+, 2020 (in IEEE-TNNLS)] Simple structure (MDT ➔ MLP ➔ inverse MDT) Each column vector (image patch) is independently transformed by MLP MLP is trained as Auto-Encoder (AE), and it can be interpreted as manifold learning In practice, Denoising-Auto-Encoder (DAE) is employed MDT … … Inv MDT Multi Layer Perceptron (MLP) Train as Denoising-Auto-Encoder (DAE)

Slide 160

Slide 160 text

160 Manifold Modeling in Embedded Space (MMES) MMES can be interpreted as optimizing the following problem Prior in MMES is the low-dimensionality of the patch manifold MDT Prior: Low dimensional patch manifold … … vectorize

Slide 161

Slide 161 text

161 Manifold Modeling in Embedded Space (MMES) MMES can be interpreted as optimizing the following problem Prior in MMES is the low-dimensionality of the patch manifold Replacing the manifold with MLP(AE), it can be transformed as MDT Prior: Low dimensional patch manifold Error term Regularization term

Slide 162

Slide 162 text

162 Conceptual illustration Compressed in low-dimensional space Patches +noise +noise +noise … … Reconstructed patches Patch aggregation Patch extraction ・Compression works well when many similar patches ➔ Low-dimensionality can be image prior ➔ It can be regarded as non-local similarity ・Denoising-Auto-Encoder(DAE) learns low- dimensional manifold from patch distribution Data Fidelity Corrupted image Interpretation of DAE [Vincent+, 2010]

Slide 163

Slide 163 text

163 Implementation of MMES: Inpainting It can be implemented using Deep Learning toolbox (TensorFlow etc) Cost function is minimized by using Adam optimizer Various task can be solved by same framework MDT … … Inv MDT AE loss loss mask

Slide 164

Slide 164 text

164 It can be implemented using Deep Learning toolbox (TensorFlow etc) Cost function is minimized by using Adam optimizer Various task can be solved by same framework Implementation of MMES: Deblurring MDT … … Inv MDT AE loss loss blur

Slide 165

Slide 165 text

165 It can be implemented using Deep Learning toolbox (TensorFlow etc) Cost function is minimized by using Adam optimizer Various task can be solved by same framework Implementation of MMES: Super-Resolution MDT … … Inv MDT AE loss loss Down simple

Slide 166

Slide 166 text

166 Preliminary experiments (signal recovery) Recovery of a noisy and incomplete one-dimensional signal Original Observed Low-rank manifold

Slide 167

Slide 167 text

167 Color image inpainting from 99% missing 99% missing result Optimization behavior of MMES

Slide 168

Slide 168 text

168 Two dimensional space on the patch manifold Learning two-dimensional latent space in AE from an incomplete image 168

Slide 169

Slide 169 text

169 Comparison of various image priors DIP and MMES work well Low-rank priors Low-rank &smoothness priors Deep image prior Sparse modeling Low-rank MDT Patch manifold 99%

Slide 170

Slide 170 text

170 Comparison (99% missing) Original LRTV convex Smooth CPD MDT low rank Deep Image Prior MDT manifold

Slide 171

Slide 171 text

171 Comparison (99% missing) Original LRTV convex Smooth CPD MDT low rank Deep Image Prior MDT manifold

Slide 172

Slide 172 text

172 Comparison (99% missing) Original LRTV convex Smooth CPD MDT low rank Deep Image Prior MDT manifold

Slide 173

Slide 173 text

173 Comparison: Super resolution Bicubic Deep Image Prior MDT manifold ×4 ×8

Slide 174

Slide 174 text

174 Comparison: Deblurring Blurred DIP MDT Manifold

Slide 175

Slide 175 text

175 Summary of Part III Two approaches use NN Learning data-driven prior from a large amount of data SRCNN, SRGAN, PULSE Network structure (CNN) provides image prior Deep image prior, Patch manifold model ML based image restoration has much attentions Data-driven DIP Patch manifold Accuracy ◎ ○ ○ Computation Fast Slow Slow Training data Necessary Not necessary No necessary Interpretability Depend on data Unclear Clear

Slide 176

Slide 176 text

176 Closing tutorial I’d like to show my appreciation to organizers of APSIPA ASC 2021 and all participants!! In this tutorial, I tried to summarize prior-based image restoration methods from multiple perspectives: classical methods (sparseness, smoothness, and non-local similarity priors), tensor methods, and neural network methods. In my opinion, two important points in the study of image restoration are a technical viewpoint about the behavior of mathematical models and optimization algorithms a realistic viewpoint for digging into the essential nature of images I hope this tutorial helps your study in the future. Basics Tensors Neural Nets