Slide 1

Slide 1 text

PyCon Taiwan 2019 Getting started with Sparse Modeling with spm-image Takashi Someda CTO, Hacarus Inc. September 21st, 2019

Slide 2

Slide 2 text

About Me Takashi Someda, @tksmd Director/CTO at HACARUS Inc. Master’s degree of Information Science at Kyoto University Sun Microsystems K.K. → Founded own startup at Kyoto → Established Kyoto Branch of Nulab Inc. → Current

Slide 3

Slide 3 text

Today’s Takeaways • Basic concept of Sparse Modeling • Image and time series data analysis • Guide to Python examples using spm-image

Slide 4

Slide 4 text

Introduction of Sparse Modeling

Slide 5

Slide 5 text

Blackbox Problem Blackbox AI Target Result Blackbox AI cannot answer questions like • Why do I get the result ? • When does it succeed/fail ? • How can I correct the result ? difficult to use AI even if it shows fascinating performance

Slide 6

Slide 6 text

Approach to explainable AI • Post-hoc explain a given AI model • Individual prediction explanations • Global prediction explanations • Build an interpretable model • Logistic regression, Decision trees and so on For more details, refer to Explainable AI in Industry (KDD 2019 Tutorial)

Slide 7

Slide 7 text

Lacking or missing data • Tiny dataset • Data augmentation • Transfer Learning in Deep Learning • Missing values • Imputation of mean values, using regression, etc • Use missing value friendly model

Slide 8

Slide 8 text

History of Sparse Modeling Year Paper Author 1996 Regression Shrinkage and Selection via Lasso R. Tibshirani 1996 Sparse Coding B. Olshausen 2006 Compressed Sensing D.L. Donoho 2018 Multi-Layer Convolutional Sparse Modeling M. Elad

Slide 9

Slide 9 text

• Problem Settings • Output y can be expressed as linear combination of x with observation noise ε where x is m dimensional and sample size of y is n = $ $ + ⋯ + ( ( + Basic approach to the problem • Least squares method • Minimize least square errors of y and multipliers of x and estimated w min 1 2 − 0

Slide 10

Slide 10 text

What if data is not sufficient ? → Assume that not all input features won’t be used to express y • Additional constraint will be introduced as regularization term • Objective function can be changed to the following form min 1 2 − 0 + 3 ⇒ Regularization parameter λ controls effectiveness of regularization Introduce Regularization

Slide 11

Slide 11 text

• L0 norm optimization • Use minimum number of x to satisfy equation • Find w to minimize number of non-zero elements • Combinational optimization • NP-hard and not feasible L L0 and L1 norm optimization • L1 norm optimization • Find w to minimize sum of its absolute values • Global solution can (still) be reached • Solved within practical time Relax constraint Least Absolute Shrinkage and Selection Operator

Slide 12

Slide 12 text

spm-image • Python library for Sparse Modeling (OSS) • https://github.com/hacarus/spm-image/ • scikit-learn compliant interface • supported algorithms and planned works • generalized Lasso (4 variants) • k-svd and more… • total variation, more lasso (planned) • more examples (planned)

Slide 13

Slide 13 text

# scikit-learn from sklearn.linear_model import Lasso model = Lasso(alpha=0.1) model.fit(X_train, Y_train) model.score(X_test, y_test) # spm-image from spmimage.linear_model import LassoADMM as Lasso model = Lasso (alpha=0.1) model.fit(X_train, Y_train) model.score(X_test, y_test) Code: scikit-learn and spm-image

Slide 14

Slide 14 text

X is 1,000 dimensional input features Only 20 features out of 1,000 are relevant to output y Only 100 samples are available Example: Numerical experiment

Slide 15

Slide 15 text

Image Analysis

Slide 16

Slide 16 text

Compressed Sensing • Problem Settings • y: observed data A: observation, then estimate x with sparse constraint Lasso • Objective functions http://mriquestions.com/what-is-k-space.html (image source of k-space) k-space MRI Fourier Transform

Slide 17

Slide 17 text

Total Variation min 7 − 0 + ∇ $ Total Variation TV makes an image smooth while keeping edge

Slide 18

Slide 18 text

1. Extract patches from images 2. Learn dictionary to express every patches 3. Every patches should be represented as sparse combination of dictionary basis Y: Image A: Dictionary : < < X: coefficient Dictionary Learning

Slide 19

Slide 19 text

Dictionary (8x8, 64 basis) Sparse Coding (green is 0) Example: Image Reconstruction Sparse Encode Reconstruction

Slide 20

Slide 20 text

# extract patches patches = extract_simple_patches_2d(img, patch_size) # normalize patches patches = patches.reshape(patches.shape[0], -1).astype(np.float64) intercept = np.mean(patches, axis=0) patches -= intercept patches /= np.std(patches, axis=0) # dictionary learning model = KSVD(n_components=n_basis, alpha=1, n_iter=n_iter, n_jobs=1) model.fit(patches) # reconstruction reconstructed_patches = np.dot(code, model.components_) reconstructed_patches = reconstructed_patches.reshape(len(patches), *patch_size) reconstructed = reconstruct_from_simple_patches_2d(reconstructed_patches, img.shape) Code: Image Reconstruction

Slide 21

Slide 21 text

Example: Inpainting and Super Resolution Inpainting Super Resolution

Slide 22

Slide 22 text

X = extract_and_flatten(img, patch_size) # normalize values except missing values X = np.where(X == 0, -9999, X) for idx in range(X.shape[0]): target_idx = np.where(X[idx, :] != -9999) target = X[idx, target_idx] t_mean = np.mean(target) t_std = np.std(target) X[idx, target_idx] = (target - t_mean)/t_std # fit with missing values model = KSVD(n_components=n_components, transform_n_nonzero_coefs=k0, max_iter=10, missing_value=-9999, method='approximate’) model.fit(X) Code: Inpainting

Slide 23

Slide 23 text

Anomaly Detection Dictionary feature extraction Sparse code Dataset of only “Good” images Reconstruction Error PSNR/SSIM… Test image Classifier Result

Slide 24

Slide 24 text

Example: Defect detection Proposed methods(paper below) and Dictionary Learning based approach https://arxiv.org/abs/1807.02894 Paper (SVM) Paper (CNN) Dictionary Learning Dataset 800 images 800 images 60 images Training time 30 mins 5 hours 19 secs Inference time 8 mins 20 secs 10 secs Accuracy 85% 86% 90% Monocrystalline modules

Slide 25

Slide 25 text

Time series analysis and more

Slide 26

Slide 26 text

Generalized Lasso Lasso Generalized Lasso • Problem Settings (again) • y: observed data A: observation, then estimate x with sparse constraint • Objective functions

Slide 27

Slide 27 text

Constraint Design Constraint for 1st order differential 2 neighboring values tends to be same Fused Lasso Trend Filtering Constraint for 2nd order differential 3 neighboring values tends to be on straight line

Slide 28

Slide 28 text

Example: Fused Lasso and Trend Filtering Fused Lasso Trend Filtering Observed y Estimated x

Slide 29

Slide 29 text

Code: Fused Lasso class FusedLassoADMM(GeneralizedLasso): def __init__(self, alpha=1.0, sparse_coef=1.0, trend_coef=1.0, rho=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=1000, tol=1e-4): super().__init__(alpha=alpha, rho=rho, fit_intercept=fit_intercept,normalize=normalize, copy_X=copy_X, max_iter=max_iter, tol=tol) self.sparse_coef = sparse_coef self.trend_coef = trend_coef def generate_transform_matrix(self, n_features: int) -> np.ndarray: fused = np.eye(n_features) - np.eye(n_features, k=-1) fused[0, 0] = 0 return self.merge_matrix(n_features, fused) def merge_matrix(self, n_features: int, trend_matrix: np.ndarray) -> np.ndarray: generated = self.sparse_coef * np.eye(n_features) + self.trend_coef * trend_matrix return generated

Slide 30

Slide 30 text

Code: Trend Filtering class TrendFilteringADMM(FusedLassoADMM): def generate_transform_matrix(self, n_features: int) -> np.ndarray: trend = 2 * np.eye(n_features) - np.eye(n_features, k=-1) - np.eye(n_features, k=1) trend[0, 0] = 1 trend[-1:, -1] = 1 return self.merge_matrix(n_features, trend)

Slide 31

Slide 31 text

Example: Numerical experiment • True trend is sin wave (blue) • Gaussian noise is added to observed data (red) • Estimated trend is fitted by Trend Filtering using Generalized Lasso (green)

Slide 32

Slide 32 text

Example: Bone image analysis Extract bone Calculate curvature Fused Lasso Trend Filtering

Slide 33

Slide 33 text

Summary

Slide 34

Slide 34 text

Sparse Modeling in a nutshell • Small dataset, explainable, lightweight • Applicable to image and time series data • Has been developed over 20 years and still evolving

Slide 35

Slide 35 text

Sparse Modeling vs Other SOTA ML methods Sparse Modeling Other SOTA ML methods Makes rule from data and prior information, i.e. sparsity Makes rule (basically) only from data Can start with small dataset and both training and inference is fast Requires big dataset with a lot of time for training Focus on specific use cases Can support very wide range of problem

Slide 36

Slide 36 text

Thank you !! Please come visit our booth at 2nd floor!