degree of Information Science at Kyoto University Sun Microsystems K.K. → Founded own startup at Kyoto → Established Kyoto Branch of Nulab Inc. → Current
Jan, 2014 EMPLOYEES: 49 LOCATIONS: Kyoto (HQ), Tokyo, Manila CUSTOMER SUPPORT IN: Japanese, English, German, Tagalog, Chinese and Swedish ACADEMIC PARTNERS
questions like • Why do I get the result ? • When does it succeed/fail ? • How can I correct the result ? difficult to use AI even if it shows fascinating performance
model • Individual prediction explanations • Global prediction explanations • Build an interpretable model • Logistic regression, Decision trees and so on For more details, refer to Explainable AI in Industry (KDD 2019 Tutorial)
and Selection via Lasso R. Tibshirani 1996 Sparse Coding B. Olshausen 2006 Compressed Sensing D.L. Donoho 2018 Multi-Layer Convolutional Sparse Modeling M. Elad
+ ) ) + * * + + + ⇒ find w to satisfy the condition above between y and X Linear Regression in sales forecast area(m^2) Distance from station(km) Regional Population Competitors Shop ID Products Revenue
many weight values to 0 as possible Assume most of variables are irrelevant area(m^2) Distance from station(km) Regional Population Competitors Shop ID Products Revenue
linear combination of x with observation noise ε where x is m dimensional and sample size of y is n = & & + ⋯ + < < + Basic approach to the problem • Least squares method • Minimize least square errors of y and multipliers of x and estimated w min 1 2 − (
will be introduced as regularization term • Objective function can be changed to the following form min 1 2 − ( + C ⇒ Regularization parameter λ controls effectiveness of regularization Introduce Regularization
to satisfy equation • Find w to minimize number of non-zero elements • Combinational optimization • NP-hard and not feasible L L0 and L1 norm optimization • L1 norm optimization • Find w to minimize sum of its absolute values • Global solution can (still) be reached • Solved within practical time Relax constraint Least Absolute Shrinkage and Selection Operator
2. Update I D = K L MN L O , Where (D) = − ∑TUD T T and S is soft thresholding function 3. Repeat 2 until converging condition is satisfied Algorithm for Lasso Coordinate Descent
every patches 3. Every patches should be represented as sparse combination of dictionary basis Y: Image A: Dictionary [ ] ] X: coefficient Dictionary Learning
below) and SPECTRO by HACARUS https://arxiv.org/abs/1807.02894 Paper (SVM) Paper (CNN) SPECTRO Dataset 800 800 60 Training time 30 mins 5 hours 19 secs Inference time 8 mins 20 secs 10 secs Accuracy 85% 86% 90% Monocrystalline modules