Slide 1

Slide 1 text

AAAI 2021 Ando

Slide 2

Slide 2 text

- PU learning: • Learning from positive and unlabeled examples - GAN: • A kind of data augmentation method. • Learns a generator to generate data (e.g., images) to fool a discriminator which tries to determine whether the generated data belong to a (positive) training class. 2 Intro https://www.imagazine.co.jp/gan%EF%BC%9A%E6%95%B5%E5%AF%BE%E7%9A%84%E7%94%9F%E6%88%90%E3%83% 8D%E3%83%83%E3%83%88%E3%83%AF%E3%83%BC%E3%82%AF%E3%81%A8%E3%81%AF%E4%BD%95%E3%81%8B% E3%80%80%EF%BD%9E%E3%80%8C%E6%95%99%E5%B8%AB/

Slide 3

Slide 3 text

- Problems in applying GAN to PU learning • Directly applying GAN is problematic because GAN focuses on only the positive data. • The resulting PU learning method will have high precision but low recall. - We propose a new objective function based on KL-divergence. (PAN) - Advantage of PAN • A major advantage of PAN is that it does not need the input of class prior probability, which many state-of-the-art systems need. • when the class prior probability estimate is off, the existing methods can perform quite poorly • PAN can be applied to any data as it has no generator. 3 Intro

Slide 4

Slide 4 text

- GAN is an adversarial learner that learns a generator to generate data instances (e.g., images) similar to those in the real/training data. • It has two networks, a generator G() and a discriminator D(). • G() tries to generate new data instances that can approximate the real data to fool the discriminator. • D() tries to discriminate the generated data from the real data 4 Background: GAN Discriminator: RealデータをPositiveにGeneratorが作ったデータをNegativeに Generator: DiscriminatorがPositiveと識別するように

Slide 5

Slide 5 text

- We now propose a direct adaptation of GAN for PU learning, called a-GAN • The goal of C() is to identify likely positives in the unlabeled set U to give to the discriminator for it to decide whether these are real positive data. 5 Direct Adaptation of GAN for PU Learning

Slide 6

Slide 6 text

- We propose to use the adversarial learning idea on the probability distributions of D() and C() on each example • The adversarial learning is performed through a distance metric. • D() tries to enlarge the distance with C() but C() tries to shrink the distance, which is applied to each example in U (no sampling is used). 6 Proposed PAN

Slide 7

Slide 7 text

- 第1項: Discriminatorと実データのラベル分布が近くなるように - 第2項 • Discriminator: Classifierとラベル分布が遠くなるように(Classifierは近くなるように) - 第3項 • Discriminator: Classifierのopposite distributionと近くなるように - II can cause unbalanced training between positive and negative examples and lead to high precision and low recall. To this end, we propose the term marked III which can eliminate the concern 7 Proposed PAN

Slide 8

Slide 8 text

- If we donʼt have term III, the gradient of D() is: - In Term (a) and Term (b), asymmetric for positive and negative data as positive data exist in the unlabeled set. That causes the positive being over optimized toward negative. - Also occurs in the gradient for C() 8 Asymmetry of KL(D||C) for Positive and Negative Data

Slide 9

Slide 9 text

- After adding the term marked III, 9 Asymmetry of KL(D||C) for Positive and Negative Data

Slide 10

Slide 10 text

- Data-sets 10 Experiments PU Robust PU Generation SVM based PU More robust PU Note that both UPU and NNPU need the input of the class prior probability, which is often not available in practice. In our experiments, we give them the correct class priors.

Slide 11

Slide 11 text

- We can observe that PAN outperforms all baselines on all datasets 11 Result

Slide 12

Slide 12 text

- For each dataset, we randomly select 1 or 2 classes in the original data to form the positive set, and the rest to form the negative set - We conclude although the class prior can be estimated, if the estimate is off, the results can be quite poor. 12 Prior probability for NNPU