Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interpretable Machine Learning 6.3 - Prototypes and Criticisms

himkt
July 06, 2019

Interpretable Machine Learning 6.3 - Prototypes and Criticisms

himkt

July 06, 2019
Tweet

More Decks by himkt

Other Decks in Research

Transcript

  1. Interpretability

    Machine Learning
    6.3 Prototype and Criticisms
    Makoto Hiramatsu <@himkt>
    * Figures is taken from the book

    View full-size slide

  2. Prototypes and Criticisms
    •A Prototype is a data instance

    that is representative of all the data
    •A Criticism is a data instance

    that is not well represented by the set of prototypes
    2

    View full-size slide

  3. Prototypes and Criticisms
    3

    View full-size slide

  4. Prototypes and Criticisms
    • Prototypes are manually selected
    • There are many approaches to find prototypes
    • How do we find criticisms?: MMD-critic [Kim+, 2016]
    • Combines prototypes and criticisms in a single framework
    4

    View full-size slide

  5. MMD-critic
    •Maximum Mean Discrepancy:

    discrepancy between two distributions
    1. Select the number of prototypes and criticisms
    2. Find prototypes (by greedy)
    • Selected so that the distribution of the prototypes

    is close to the data distribution
    3. Find criticisms (by greedy)
    • Selected so that the distribution of the criticisms

    differs from the data distribution
    5

    View full-size slide

  6. MMD-critic: Ingredients
    •Kernel function: estimate the data density
    •Witness function: tell us how different two
    distribution are at a particular data point
    •Search Strategy: (greedy)
    6
    witness(x) =
    1
    n
    n
    X
    i=1
    k(x, x0)
    1
    m
    m
    X
    j=1
    k(x, zj)
    AAACMHichY7LTgIxGIVbvCHeRl26aSQaSBRn0MQVCdGNS0zkkjgw6ZSCZabTSdtRcTLxPXwMn0ZXxq1PIaMYA5h4Nj39z9+vxw19prRpvsLM3PzC4lJ2Obeyura+YWxuNZSIJKF1InwhWy5W1GcBrWumfdoKJcXc9WnT9c7TvHlLpWIiuNLDkLY57gesxwjWo5FjPN0xHVClCrYr/K4a8tER3ydFVEF2T2ISW0kcJMhWEXdiVrGSToC8qeUDNHnv2KFknBbR4S+D/zAGKYP/w3hInEHRMfJmyfwSmjXW2OTBWDXHeLS7gkScBpr4WKlrywx1O8ZSM+LTJGdHioaYeLhPY8xV+leC9jjWN2o6S4d/Z1Li4STKFcLT2FVJbtTYmu43axrlknVcKl+e5Ktn4+5ZsAN2QQFY4BRUwQWogTogEMB9eARN+Axf4Bt8/17NwPGbbTAh+PEJLlCoiw==
    … Gaussian RBF
    k(x, x0) = exp( ||x x0||2) ( > 0)
    AAACW3icbVFNTwIxFOyuX4hfq8aTl0ZiAomSXTTRi4boxSMmAiYskm55QEO7u2m7BrLwJz3pwb9iLLAHUV/SdDpvJn2dBjFnSrvuh2WvrK6tb+Q281vbO7t7zv5BQ0WJpFCnEY/kc0AUcBZCXTPN4TmWQETAoRkM72f95itIxaLwSY9jaAvSD1mPUaIN1XHksOgHEe+qsTBbOpqe4eXzix9LJqCEb7APo7h47veJEARPJsu6839tk8lLpYR9XMxct9gtdZyCW3bnhf8CLwMFlFWt47z53YgmAkJNOVGq5bmxbqdEakY5TPN+oiAmdEj60DIwJAJUO51nM8WnhuniXiTNCjWesz8dKRFqNrVRCqIH6ndvRv7XayW6d91OWRgnGkK6uKiXcKwjPAsad5kEqvnYAEIlM7NiOiCSUG2+I29C8H4/+S9oVMreRbnyeFmo3mVx5NAxOkFF5KErVEUPqIbqiKJ39GVtWDnr016x8/b2QmpbmecQLZV99A2mmLW1

    View full-size slide

  7. 7
    •Kernel function: estimate the data density
    •Witness function: tell us how different two
    distribution are at a particular data point
    •Search Strategy: (greedy)
    MMD-critic: Ingredients
    witness(x) =
    1
    n
    n
    X
    i=1
    k(x, x0)
    1
    m
    m
    X
    j=1
    k(x, zj)
    AAACMHichY7LTgIxGIVbvCHeRl26aSQaSBRn0MQVCdGNS0zkkjgw6ZSCZabTSdtRcTLxPXwMn0ZXxq1PIaMYA5h4Nj39z9+vxw19prRpvsLM3PzC4lJ2Obeyura+YWxuNZSIJKF1InwhWy5W1GcBrWumfdoKJcXc9WnT9c7TvHlLpWIiuNLDkLY57gesxwjWo5FjPN0xHVClCrYr/K4a8tER3ydFVEF2T2ISW0kcJMhWEXdiVrGSToC8qeUDNHnv2KFknBbR4S+D/zAGKYP/w3hInEHRMfJmyfwSmjXW2OTBWDXHeLS7gkScBpr4WKlrywx1O8ZSM+LTJGdHioaYeLhPY8xV+leC9jjWN2o6S4d/Z1Li4STKFcLT2FVJbtTYmu43axrlknVcKl+e5Ktn4+5ZsAN2QQFY4BRUwQWogTogEMB9eARN+Axf4Bt8/17NwPGbbTAh+PEJLlCoiw==
    0  k(x, x0)  1
    AAABzHicbU7LTsJAFL2DL8RX1aWbRmKCiSEtGt0S3bgymMgjsUhmhitOOu3UmcFImsadX+HXuNUf8G8E6QbwbO6559zHYYkUxnreDyksLa+srhXXSxubW9s7zu5ey6ih5tjkSirdYdSgFDE2rbASO4lGGjGJbRZeTfz2C2ojVHxnRwl2IzqIxaPg1I6lnnPuuYHEZzesBEzJvhlF45K+ZifubP8QJFpEeDyd9ntO2at6f3AXiZ+TMuRo9Jy3oK/4MMLYckmNufe9xHZTqq3gErNSMDSYUB7SAaY0MpO/mXsUUftk5r2J+L+nNR3NnmJKhZYyk5XGif35fIukVav6p9Xa7Vm5fplnL8IBHEIFfLiAOlxDA5rA4QM+4Qu+yQ2xJCXZdLRA8p19mAF5/wVQmYG1
    Infinite far apart (?) Equal
    What would happen to the formula if

    we used all n data points as prototypes?
    … Gaussian RBF
    k(x, x0) = exp( ||x x0||2)
    AAACTHicbVBNTwIxFOyiIuIX6tFLIzHBRMkumujFhOjFIyYCJiyQbnlAQ7u7absGsvADvXjw5q/w4kFjTCwfBwFf0nQ6bybvdbyQM6Vt+81KrKyuJddTG+nNre2d3czefkUFkaRQpgEP5KNHFHDmQ1kzzeExlECEx6Hq9W7H/eoTSMUC/0EPQqgL0vFZm1GiDdXM0F7O9QLeUgNhrrg/OsXz74YbSibgBF9jF/ph7sztECEIHg7ndWf/2obDRuGkmcnaeXtSeBk4M5BFsyo1M69uK6CRAF9TTpSqOXao6zGRmlEOo7QbKQgJ7ZEO1Az0iQBVjydhjPCxYVq4HUhzfI0n7F9HTIQar2mUguiuWuyNyf96tUi3r+ox88NIg0+ng9oRxzrA42Rxi0mgmg8MIFQysyumXSIJ1Sb/tAnBWfzyMqgU8s55vnB/kS3ezOJIoUN0hHLIQZeoiO5QCZURRc/oHX2iL+vF+rC+rZ+pNGHNPAdorhLJX8RUtiA=

    View full-size slide

  8. Witness function
    •Positive value at point x: 

    the prototype distribution overestimates the data distribution
    •Negative value at point x:

    the prototype distribution underestimates the data distribution
    • We look for extreme values of the witness function

    in both negative and positive directions
    8

    View full-size slide

  9. MMD2 (squared)
    9
    MMD2 =
    1
    m2
    m
    X
    i,j=1
    k(zi, zj)
    2
    mn
    m,n
    X
    i,j=1
    k(zi, xj) +
    1
    n2
    n
    X
    i,j=1
    k(xi, xj)
    AAACbnichY5bS8MwAIXTepv1VhV8EbE4hIlztFXwaTLUB18GE9wF7FbSLJtdk7Y0qWyW4t/Uf+FPsKuFuTrwvORwTvLlWD6xGVfVT0FcWl5ZXSusSxubW9s78u5ei3lhgHATecQLOhZkmNgubnKbE9zxAwypRXDbcu6mffsVB8z23Cc+8XGXwqFrD2wEeRKZ8ke9ft/TlapiDAKIIi2OaE+PFYOF1IzssjKqanGPKk7JsDzSZxOaHNFbbCZVLhmdSRcZRU8o7gySMiJaTqL/OOOUcz5b4/5ek4LcHGS8GGLKRbWiplL+Gi0zRZCpYcrvRt9DIcUuRwQy9qypPu9GMOA2IjiWjJBhHyIHDnEEKZv+FSunFPIXlu+m4eIuCOBkHmV5nsOhxWIpWazl9/01Lb2iXVb0x6ti7TbbXgCH4ASUgAauQQ08gAZoAiTcCH2BCq7wJR6IR+Lxz1VRyN7sgzmJpW8yV7ve
    1SPYJNJUZCFUXFFOQSPUPUZQFT
    1SPYJNJUZCFUXFFOEBUBQPJOUBOEQSPUPUZQF
    1SPYJNJUZCFUXFFOEBUBQPJOUT

    View full-size slide

  10. When MMD2 is minimized
    10
    • Answer: when All data points are prototypes

    View full-size slide

  11. MMD2 behaviors
    11

    View full-size slide

  12. Advantages
    • We can focus on typical/edge cases
    • Remember Google’s image classifier problem
    • Participants performed better when the sets

    showed prototypes criticisms instead of

    random images of class
    • Suggest that prototypes and criticisms

    are informative examples
    • MMD-critic works with any type of data and

    any type of machine learning model 12

    View full-size slide

  13. Examples: dog breeds classification
    • Left prototypes: The face of dog
    • Left criticisms: Without dog faces or in different colors
    • Right prototypes: Outdoor images of dogs
    • Right criticisms: Dogs in costumes 13

    View full-size slide

  14. Examples: mnist
    • Prototypes: Various of ways of writing the digits
    • Criticisms: Unusual thick or thin and unrecognizable
    • (Note: not searched with a fixed number per class)
    14

    View full-size slide

  15. Disadvantages
    • Hard to choose number of prototypes and criticisms
    • Elbow method is useful?
    • Hard to choose kernel and scaring parameter
    • Disregard the fact:

    “some features might not be relevant”
    15

    View full-size slide