Contrastive Self-Supervised Learning

東海⼤學物理系 Contrastive Self-Supervised Learning https://yenlung.me/2022SSL

Contrastive Learning 2 蔡炎龍⾮ (UC Irvine) Py : Python
( ) Python 給 Deep Learning

3 Python & AI Python 給 ...

Contrastive Learning 4 兩個相關的 MOOC 課程 eWant : https://www.ewant.org MOOC
: https://moocs.nccu.edu.tw Python 給

Contrastive Learning 5 出了⼀本 Python 的書少年Py的大冒險   成為Python數據分析達人的第一門課 :
, , :

Contrastive Learning 6 今年會出的書! 少年Py的大冒險 II   成為 Python AI
達人的第一堂課 : , , , : ?

Contrastive Learning 7 ⽬前最詳細的直播錄影 https://bit.ly/2021_FG_DeepLearning

Contrastive Learning 8 正在開設的 AI 課程 https://www.youtube.com/c/iyenlung > 1102 (
)

AI 就是打造函數學習機 01.

Contrastive Learning 10 我們的問題都要化為函數的形式 , ! f x y

Contrastive Learning 11 也就是我們想逼近⼀個函數... y = f(x) 維 f

Contrastive Learning 12 也就是我們想逼近⼀個函數... f(x, y, u(x, y)) = 0
維

Contrastive Learning 13 神經網路 (Neural Network) 神經元是基本的運算單元

Contrastive Learning 14 深度學習就是建⼀層層「隱藏層」 x1 x2 xn h1 h2 hk
x h ℱ1 全連結層 (Dense) 卷積層 (Conv) 遞歸層 (LSTM, GRU) 銘 : DNN CNN RNN

Contrastive Learning 15 深度學習就是建⼀層層「隱藏層」 x ̂ y input layer hidden
layers outpu t layer DNN, CNN, RNN 銘維 , 銘

Contrastive Learning 16 神經元怎麼運作 ,

Contrastive Learning 17 神經元怎麼運作 , (activation function) φ( 3 ∑
i=1 wi xi + b) = h φ( ) = h

Contrastive Learning 18 Universal Approximation Theorem 銘 , 維 !

Contrastive Learning 19 打造「函數學習機」 വᏐ ላशػ ! {wi , bj
} θ ,

Contrastive Learning 20 訓練 (學習) , , ! θ fθ

Contrastive Learning 21 ⽬標函數、loss function , , , : i
xi yi ℓi (θ) = ∥yi − fθ (xi )∥2 ( 1/2): L(θ) = 1 2N N ∑ i=1 ∥yi − fθ (xi )∥2

Contrastive Learning 22 ⽬標函數、loss function 1 2 3 [ 1
0 0 ] [ 0 1 0 ] [ 0 0 1 ] one-hot encoding [ 1 0 0 ]

Contrastive Learning 23 ⽬標函數、loss function pθ ̂ y1 ̂ y2
̂ y3 softmax 1 , , , xi yi P(yi |x, θ)

Contrastive Learning 24 Softmax: 維持⼤⼩關係, 加起來等於 1 , , ,
a, b, c α, β, γ α + β + γ = 1 0 產 a, b, c S = a + b + c , , α = a S β = b S γ = c S

Contrastive Learning 25 Softmax: 維持⼤⼩關係, 加起來等於 1 , , ,
a, b, c α, β, γ α + β + γ = 1 維 0 a, b, c , a′ = ea, b′ = eb, c′ = ec S = a′ + b′ + c′ , , α = a′ S β = b′ S γ = c′ S

Contrastive Learning 26 Softmax: 維持⼤⼩關係, 加起來等於 1 , k ,
, , , 維 : z1 , z2 , …, zk ¯ z1 , ¯ z2 , …, ¯ zk k ∑ i=1 ¯ zi = 1 ¯ zj = exp(zj ) ∑k i=1 exp(zi )

Contrastive Learning 27 再來就是準備訓練資料 (做 labeling) , , 1000 !
× 1000 × 1000 × 1000

Contrastive Learning 28 ⽬標函數、loss function Shannon information theory , ,
維 −log P(x) ? ...

Contrastive Learning 29 ⽬標函數、loss function , !! ℓi (θ) =
− log P(yi |x, θ) cross entropy

Contrastive Learning 30 【監督式學習】由我們準備訓練資料 ( , " ") ( ,
"蠎 ") , , ... x1 x2 y2 y1 x k+1 , y k+1 x k , y k x1 , y1 x n , y n , (over fi tting) !

Contrastive Learning 31 監督式學習神經網路的⼤成功! AI ( ), 裁

Contrastive Learning 32 但是... 需要⼤量、⾼品質的標記資料 , (labeling) 8 0 %

Contrastive Learning 33 但有時訓練資料不容易準備! 有標記的資料太少! 我們不知什麼是正確答案! 訓練資料難以準備!

Contrastive Learning 34 更重要的, ⼩朋友學習能⼒都比 AI 強 ( ) !
維 ! 維 ...

Contrastive Learning 35 Self-Supervised Learning We believe that self-supervised learning
is one of the most promising ways to build such background knowledge and approximate a form of common sense in AI systems. “ —Yan LeCun (楊⽴昆)/Ishan Misra, 2021 ” Self-supervised Learning: The Dark Matter of Intelligence https://ai.facebook.com/blog/self-supervised- learning-the-dark-matter-of-intelligence/

Contrastive Learning 36 【例⼦】訓練資料難以準備 , !

Contrastive Learning 37 【例⼦】訓練資料難以準備 ! π

Contrastive Learning 38 【例⼦】我們不知道正確答案 f , 狗 !

Contrastive Learning 39 【非督督式學習】基本想法1 fθ ( ) ! self-suprevised learning

Contrastive Learning 40 【非督督式學習】基本想法2 fθ J(θ) self-supervised Contrastive Learning

Contrastive Learning 41 【非督督式學習】基本想法3 銘 embedding Pretext Task

NLP 尋找詞代表向量 02.

Contrastive Learning 43 Feature Engineering f x y , feature

Contrastive Learning 44 Feature Engineering PCA x x dimension reduction,
PCA

Contrastive Learning 45 Feature Engineering deep learning feature engineering f
... ...

Contrastive Learning 46 Feature Engineering 銘維 feature engineering , 維
feature engineering

Contrastive Learning 47 Feature Engineering 維銘 feature engineering

Contrastive Learning 48 Representation Learning representation

Contrastive Learning 49 表⽰向量 ... fθ 輸出輸入 [ 94
87]

Contrastive Learning 50 Word Embedding 在⾃然語⾔處理當中, 最基本的問題就是, 我們如何把語⾔「輸入」... fθ ⼀段⽂字

Contrastive Learning 51 Word Embedding 通常我們就是⼀個字 (或⼀個詞), 就給它⼀個代表的「特徵向量」。 fθ
龍 [ 94 87] 這樣的函數就叫做⼀個 word embedding。

Contrastive Learning 52 Word Embedding 還有個⼩問題... fθ 龍這裡也要變成數字才能輸入電腦

Contrastive Learning 53 我們給字編號! 的一了是我最常⾒的⽅式是我們把字依出現的頻率排序,
越常出現給的編號越⼩。 1 2 3 4 5

Contrastive Learning 54 然後 one-hot encoding! 的一了是
我 one-hot encoding! 1 2 3 4 5 1 0 0 0 0 ⋮ 0 1 0 0 0 ⋮ 0 0 1 0 0 ⋮ 0 0 0 1 0 ⋮ 0 0 0 0 1 ⋮ one-hot encoding !

Contrastive Learning 55 Word2Vec 我們以著名的 Word2Vec 來看看怎麼做 word embedding? 相似的字
會在⼀起! Google 官網: https://code.google.com/archive/p/word2vec/

Contrastive Learning 56 Word2Vec T. Mikolov, K. Chen, G. Corrado,
J. Dean. Toutanova. Efficient Estimation of Word Representations in Vector Space. Proceedings of Workshop at ICLR, 2013.. 訓練好了有很多炫炫的功能。巴黎法國義⼤利羅⾺國王男⼈女⼈皇后

Contrastive Learning 57 這是學了什麼函數呢? f 龍 [ 94 87] 我們當然知道,
word embedding 就是要學⼀個字的特徵向量, 但我們沒辦法準備訓練資料啊!

Contrastive Learning 58 重點還是在函數! 基本上你就設計⼀個任務, ⽽這個任務你覺得電腦要「懂字的意思」才能做到! f wt−2 wt
wt−1 wt+1 wt+2 CBOW model ⽤周圍的字預測中間的字。

Contrastive Learning 59 重點還是在函數! 或是更炫的去訓練這樣的函數! f Skip-Gram model 中間的字預測週圍的字 wt−2
wt wt−1 wt+1 wt+2

Contrastive Learning 60 重點還是在函數! Embedding 我們看要壓到幾維向量, 比如說 128 維, 那就
在神經網路中間的隱藏層, 放 128 個神經元!

Contrastive Learning 61 記憶或理解 word 2 vec , 維 ,
維 w11 w12 ⋯ w1N w21 w22 ⋯ w2N ⋮ ⋮ ⋮ wi1 wi2 ⋯ wiN ⋮ ⋮ ⋮ wV1 wV2 ⋯ wVN W

Contrastive Learning 62 記憶或理解 h W x One-hot encoding T
0 0 ⋮ 1 ⋮ 0 w11 w12 ⋯ w1N w21 w22 ⋯ w2N ⋮ ⋮ ⋮ wi1 wi2 ⋯ wiN ⋮ ⋮ ⋮ wV1 wV2 ⋯ wVN WTx= h word 2 vec , 維銘 ! = h

Contrastive Learning 63 傳統 Word Embedding 還是有缺點 Word Embedding 基本上固定的字
(詞) 就有固定代表的特徵向量。但是... 這個⼈的個性有點天天。我天天都會喝⼀杯咖啡。⼀個字、⼀個詞, 在不同的地⽅可能有不⼀樣的意思。

Contrastive Learning 64 語意型的 word embedding! f 某個意涵編碼⽤意涵來編碼!
這真的做得到?

Contrastive Learning 65 ELMo 開創⾃然語⾔的「芝⿇街時代」! ELMo M.E. Peters, M. Neumann,
M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer. Deep contextualized word representations. NAACL 2018. arXiv preprint arXiv:1802.05365v2. AI2

Contrastive Learning 66 其實就是 RNN 的 hidden states 𝐡 1
𝐡 2 𝐡 𝑛 −1 𝐡 𝑛 <BOS> 我天天啡咖喝咖我們要的 embedding 對話機器⼈的 hidden states 就是很好的 embedding!

Contrastive Learning 67 沒⼈限制我們只能有⼀層! 𝐡 1 𝐡 2 𝐡 𝑛
−1 𝐡 𝑛 <BOS> 天喝咖 𝐡 1 𝐡 2 𝐡 𝑛 −1 𝐡 𝑛 LSTM1 LSTM2

Contrastive Learning 68 於是我們會有更「客製化」embedding hi hi token w1 w2 w3
+ + 我們在要⽤時, 才會去學 , 成為「真正」的 embedding。 w1 , w2 , w3 前⾯需要⼤量訓練資料的都不⽤動哦!

Contrastive Learning 69 引領⾃然語⾔新時代的 BERT BERT J. Devlin, M.W. Chang,
K. Lee, K. Toutanova. BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805v2. Google

Contrastive Learning 70 Transformer Vaswani, A., Shazeer, N., Parmar, N.,
Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008). 運⽤ self-attention, 避開 RNN 的缺點!

Contrastive Learning 71 Transformer BERT 的架構基本上是 transformer 的 encoder。其中⼀種訓練⽅式是這樣。
BER 我天天都會喝⼀杯__。咖啡克漏字

Contrastive Learning 03.

Contrastive Learning 73 假設我們要做⼈臉辨識以後公司⾨禁就直接⽤⼈臉辨識!

Contrastive Learning 74 感覺挺容易的, 就給每個同仁⼀個編號 1 2 3 4

Contrastive Learning 75 就學這個函數 fθ 輸出輸入 3

Contrastive Learning 76 問題1 圖形辨識⼀個類別⼤約要 1000 張才能訓練! 我們總不能叫每位同仁都來個
1000 張照片...

Contrastive Learning 77 問題2 我是新加入的, 要重新訓練嗎?

Contrastive Learning 78 這和⼈類不太⼀樣... ⼈好像不需要幾千張照片才能辨識...

Contrastive Learning 79 ⼩數據的訓練有可能嗎? 有沒有可能教電腦「怎麼學習」? 學會了⼩數據也可以訓練。

Contrastive Learning 80 如果可以找到這樣的函數... f ̂ y1 ̂ y2 ̂
yn 於是她就有個代表向量 ̂ y = [ ̂ y1 , ̂ y2 , …, ̂ yn]

Contrastive Learning 81 每個⼈就有個「代表向量」假設是公司內四位同仁的照片。 x1 , x2 ,
x3 , x4 f(x1 ) f(x2 ) f(x3 ) f(x4 ) f ̂ y1 ̂ y2 ̂ yn 看和哪個距離最⼩!

Contrastive Learning 82 於是種種問題就解決了! 比⽅說有新⼈來了, 我們就⽤訓練好的這個神經網路做她的代表向量。 f ̂ y1
̂ y2 ̂ yn

Contrastive Learning 83 還有個立即的好處我們可以規定, 什麼才叫「夠像」。就是定義⼀個數 , 如果 τ d(f(x),
f(xi )) < τ 就判定是這個⼈。所以, 我們也可以知道, 這個⼈判斷這個⼈不是公司內部的⼈。

Contrastive Learning 84 但是訓練資料難以準備... 我怎麼知道什麼是代表她最好的向量?

Contrastive Learning 85 ⾃動特徵擷取機神經網路可以想成每個隱藏層在做「⾃動特徵擷取」。所以某個隱藏層的輸出, 可以看成原資料的代表向量!

Contrastive Learning 86 從⽂字的 Word Embedding 得到的靈感... CNN ̂ y1
̂ y2 ̂ yn Dense Output (Softmax) 砍掉最後⼀層就可以! 做「正常」的⼈臉辨識, 然後砍掉最後⼀層!

Contrastive Learning 87 也可以直接就訓練判斷是否為同⼀個⼈ CNN ̂ y1 ̂ y2 ̂
yn 砍掉最後⼀層就可以! CNN ̂ y1 ̂ y2 ̂ yn 0

Contrastive Learning 88 更好的是⽤ Triplet Loss CNN ̂ y1 ̂
y2 ̂ yn CNN ̂ y1 ̂ y2 ̂ yn CNN ̂ y1 ̂ y2 ̂ yn 越近越好越遠越好 labeling

Contrastive Learning 89 更好的是⽤ Triplet Loss F. Schroff, D. Kalenichenko,
J. Philbin (Google). FaceNet: A Unified Embedding for Face Recognition and Clustering. arXiv preprint arXiv:1503.03832. CNN ̂ y1 ̂ y2 ̂ yn CNN ̂ y1 ̂ y2 ̂ yn Positive Sample Negative Sample

Contrastive Learning 90 更⼀般化就是 Contrastive Learning fθq gθk Target Sample
q k 維維 negative samples collapse

Contrastive Learning 91 Contrastive Learning sim(q, k) 可以是距離函數, 甚⾄⼤家
更常⽤內積。 ∥q − k∥2 ⟨q, k⟩ τ 1 2 例⼦

Contrastive Learning 92 Contrastive Loss ℒ(θ) = − log esim(q,k+)
∑ esim(q,k)

Contrastive Learning 93 Augmentation: 完全不做 labeling 可能嗎? , 維 labeling
( 1%), 維 model , 維 labeling ? 維 , , , augmentation

Contrastive Learning 94 Self-Supervised Learning Contrastive learning representation , labeling,
labeling , , representation , , , , Yann LeCun ... self-supervised learning LeCun

Contrastive Learning 95 Self-Supervised Learning We believe that self-supervised learning
is one of the most promising ways to build such background knowledge and approximate a form of common sense in AI systems. “ —Yan LeCun (楊⽴昆)/Ishan Misra, 2021 ” Self-supervised Learning: The Dark Matter of Intelligence https://ai.facebook.com/blog/self-supervised- learning-the-dark-matter-of-intelligence/

Contrastive Learning 96 Non-Contrastive Learning 產, negative samples, ( ),
, 維 negative samples, collapse ? fθq gθk q k x x+ Pφ

時間序列型的數據 04. joint work with Yen Jan

Contrastive Learning 98 時間序列資料當然也該找表現向量過去 20 天  某股的資料

Contrastive Learning 99 有可能更容易學到... f or

Contrastive Learning 100 該買或賣? f 買賣 -

Contrastive Learning 101 甚⾄預測後⾯的情況 f

Contrastive Learning 102 困難點時間序列 contrastive learning 相關⽂獻少很多, 其中⼀個問題是合理的 augmentation
很難做!

Contrastive Learning 103 Siamese Network 孿⽣神經網路 fθ fθ x x′
z z′ Contrastive loss

Contrastive Learning 104 標記型的數據過去 20 天  某股的資料漲跌

Contrastive Learning 105 超嚴格標準 < xt+1 < xt+2 < xt+3
< xt+4 < xt+5 xt * : 裁 xt t 未來五天⼀路上漲才算漲!

Contrastive Learning 106 可想⾒是非常不平衡的數據集! 漲跌 0 22500 45000 67500
90000

Contrastive Learning 107 不平衡數據處理 v1 v2 v3

Contrastive Learning 108 original P-adic 加強版 V1 V2 V3 V1
V2 V3 LSTM - 71.6% 71.7% - 72.1% 69.9% SiamCL 65.6% 71.5% 71.3% 73.7% 73.8% 73.3% * precision

Contrastive Learning 109 Q & A 有問題嗎?

Contrastive Self-Supervised Learning

Contrastive Self-Supervised Learning

More Decks by yenlung@mac.com

Other Decks in Technology

Featured

Transcript