Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OpenTalks.AI - Оселедец Иван, Открытые задачи машинного обучения

OpenTalks.AI
February 14, 2019

OpenTalks.AI - Оселедец Иван, Открытые задачи машинного обучения

OpenTalks.AI

February 14, 2019
Tweet

More Decks by OpenTalks.AI

Other Decks in Science

Transcript

  1. Motivation • Comparing various models is a very challenging task

    • Mode collapse detection LSGAN DCGAN WGAN WGAN-GP SAGAN CycleGAN InfoGAN AC-GAN MAD-GAN CaloGAN BEGAN Sobolev GAN and many others! GANs are extremely popular distribution learners. Need scalable metric for arbitrary datasets!
  2. Reminder • Game theoretic approach • Generator learns to mimic

    the target distribution by generating samples • Discriminator learns to distinguish real and fake data The following discussion can be applied to any distribution learners
  3. How to assess performance of a GAN? Evaluating the performance

    of a GAN is difficult - no log likelihood Which one is better? vs
  4. Some existent methods • Human annotators • Inception score •

    Frechet Inception Distance (FID) • Difficult to use beyond the ImageNet dataset • Not necessarily correlate with human judgement “A Note on the Inception Score” Shane Barratt and Rishi Sharma “Improved techniques for training GANs” Salimans et al.
  5. Manifold hypothesis Real-world high dimensional data (such as images) is

    supported on low-dimensional manifolds embedded in a high-dimensional space. It is reasonable that a good generative model samples images from the same (or a very close) manifold. Are they similar?
  6. Reminder • Game theoretic approach • Generator learns to mimic

    the target distribution by generating samples • Discriminator learns to distinguish real and fake data
  7. Our idea We use topological properties of the generated and

    of the data manifolds to measure their similarity. I.e. the manifolds may not coincide precisely, but we want them to have the same shape. Mathematical tool that can be used to describe shape is homology. Also, it can be rather easily computed.
  8. Brief introduction to homology • Well-known concept in topology. •

    Informally, k-th homology group describes k- dimensional holes in a space. • Strictly speaking, for each k it is a group, but we will use the notions of Betti numbers (ranks of these groups) and homology groups interchangeably. • In the case of point clouds we will obtain distributions rather than exact numbers.
  9. Problem We only have access to samples from those manifolds.

    The problem of computing the topological properties of the underlying manifolds is ill-posed. Topological Data Analysis (TDA) has the answer! vs
  10. Brief introduction to TDA To approximate the underlying manifold we

    will start adding simplices of various dimensions based on the proximity data.
  11. Persistence Barcode For each intermediate value of we can compute

    homology, and conveniently plot it on the persistence barcode. These barcodes summarize the topological properties of the point cloud.
  12. What to compare? In principle we can compare these barcodes,

    but this is nontrivial. For each we convert the corresponding piece of the barcode to histograms which can be easily compared.
  13. Technical details II • Obtain barcodes as before • Convert

    them to histograms and average over many different choices
  14. Numerical results: GANs in particle physics We can also apply

    this algorithm to datasets of nature other than images.
  15. Adversarial perturbations • Adversarial perturbations easily fool many state of

    the art networks • Adding a perturbation of a small norm can force misclassification This work is supported by Ministry of Education and Science of the Russian Federation (grant 14.756.31.0001)
  16. Universal adversarial perturbations • Mosaavi et al (2017) proposed universal

    perturbations: adding a single noise image allows one to fool the network in many (~70%) cases • They were also shown to generalize across networks really well Universal adversarial perturbations, Moosavi et al, CVPR 2017
  17. Problem • The algorithm uses several thousand images to obtain

    a high fooling ratio • That’s still low compared to the size of the whole dataset (hundreds of thousands images), but can we do better?
  18. Idea: Let us attack the layers • Goodfellow: linearize the

    loss (FGSM) • Our idea: linearize a hidden layer
  19. Linear algebra to the rescue • We need to compute

    generalized singular vectors • Since Jacobians are extremely large (e.g. 150000 x 1000000) we cannot store them explicitly and have to use iterative methods • Namely, we use Boyd’s generalized power method for it.
  20. Technical details • We can use Pearlmutter’s trick to compute

    matrix-by- vector product using automatic differentiation • We can attack several images at by stacking Jacobians into a big matrix • Our method needs only 30-40 images to get high fooling ratios (60 %) on the entire dataset
  21. Results • Interpretable easy algorithm • Relatively fast - only

    few minutes to construct a perturbation • We attack low-level features
  22. Shallow net Deep net Expressive power of CNNs [Nadav Cohen

    et al., “On the Expressive Power of Deep Learning: A Tensor Analysis”, 2015]
  23. Multiplicative RNNs Yuhuai Wu et al. proposed multiplicative RNNs [Y.

    Wu, S. Zhang, Y. Zhang, Y. Bengio, R. Salakhutdinov., “On Multiplicative Integration with Recurrent Neural Networks”, 2016]
  24. Tensor Train RNNs • Simplify: get rid of non- linearity

    • Generalise: combine W and U into bilinear form T, which is defined a 3d tensor G
  25. Tensor Train RNNs • Simplify: get rid of non- linearity

    • Generalise: combine W and U into bilinear form T, which is defined a 3d tensor G wher e is in the TT- format • Got with shared factors G
  26. Tensor perspective • Some form of RNNs • Shallow net

    (e.g. one conv + global product pooling) TT- decomposition Canonical decomposition
  27. Tensor perspective • Shallow net (e.g. one conv + global

    product pooling) Canonical decomposition
  28. Tensor perspective • Some form of RNNs • Shallow net

    (e.g. one conv + global product pooling) TT- decomposition Canonical decomposition
  29. Expressive power theorem Theorem: a random d-dimensional TT-tensor with probability

    1.0 has exponentially large CP-rank. Interpretation: an RNN (of the form discussed earlier) with random weights can be exactly mimicked with a shallow net only of exponentially larger width. [Valentin Khrulkov, Alexander Novikov, Ivan Oseledets 2018]
  30. Kernel approximation - Kernel trick: replace scalar products with -

    Provides good quality for many problems - Scales badly with the number of samples
  31. Kernel approximation - Revert the trick: - Use linear methods

    in the mapped space - How to generate the mapping?
  32. Random Fourier Features [Rachimi and Recht, 2008] Introduced random Fourier

    Features (RFF) for any shift-invariant kernel Just Monte-Carlo for the integral!
  33. Problems not discussed 58 - Optimal experiment design - Importance

    sampling for stochastic optimization methods