Contrastive Learning 35
Self-Supervised Learning
We believe that self-supervised learning is one
of the most promising ways to build such
background knowledge and approximate a
form of common sense in AI systems.
“
—Yan LeCun (楊⽴昆)/Ishan Misra, 2021
”
Self-supervised Learning: The Dark Matter of Intelligence
https://ai.facebook.com/blog/self-supervised-
learning-the-dark-matter-of-intelligence/
Contrastive Learning 55
Word2Vec
我們以著名的 Word2Vec 來看看怎麼做
word embedding?
相似的字
會在⼀起!
Google 官網: https://code.google.com/archive/p/word2vec/
Slide 56
Slide 56 text
Contrastive Learning 56
Word2Vec
T. Mikolov, K. Chen, G. Corrado, J. Dean. Toutanova. Efficient Estimation
of Word Representations in Vector Space. Proceedings of Workshop at
ICLR, 2013..
訓練好了有很多炫炫的功能。
巴黎 法國 義⼤利 羅⾺
國王 男⼈ 女⼈ 皇后
Slide 57
Slide 57 text
Contrastive Learning 57
這是學了什麼函數呢?
f
龍
[
94
87]
我們當然知道, word embedding 就是要
學⼀個字的特徵向量, 但我們沒辦法準備
訓練資料啊!
Slide 58
Slide 58 text
Contrastive Learning 58
重點還是在函數!
基本上你就設計⼀個任務, ⽽這個任務你
覺得電腦要「懂字的意思」才能做到!
f
wt−2
wt
wt−1
wt+1
wt+2
CBOW model
⽤周圍的字預測中間的字。
Slide 59
Slide 59 text
Contrastive Learning 59
重點還是在函數!
或是更炫的去訓練這樣的函數!
f
Skip-Gram model
中間的字預測週圍的字
wt−2
wt
wt−1
wt+1
wt+2
Contrastive Learning 62
記憶或理解
h
W
x
One-hot encoding
T 0
0
⋮
1
⋮
0
w11
w12
⋯ w1N
w21
w22
⋯ w2N
⋮ ⋮ ⋮
wi1
wi2
⋯ wiN
⋮ ⋮ ⋮
wV1
wV2
⋯ wVN
WTx= h
word
2
vec ,
維 銘
!
=
h
Slide 63
Slide 63 text
Contrastive Learning 63
傳統 Word Embedding 還是有缺點
Word Embedding 基本上固定的字 (詞) 就有
固定代表的特徵向量。但是...
這個⼈的個性有點天天。
我天天都會喝⼀杯咖啡。
⼀個字、⼀個詞, 在不
同的地⽅可能有不⼀
樣的意思。
Slide 64
Slide 64 text
Contrastive Learning 64
語意型的 word embedding!
f
某個意涵 編碼
⽤意涵來編碼!
這真的做得到?
Slide 65
Slide 65 text
Contrastive Learning 65
ELMo 開創⾃然語⾔的「芝⿇街時代」!
ELMo
M.E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L.
Zettlemoyer. Deep contextualized word representations. NAACL 2018. arXiv
preprint arXiv:1802.05365v2.
AI2
Contrastive Learning 69
引領⾃然語⾔新時代的 BERT
BERT
J. Devlin, M.W. Chang, K. Lee, K. Toutanova. BERT: Pre-
training of Deep Bidirectional Transformers for Language
Understanding. arXiv preprint arXiv:1810.04805v2.
Google
Slide 70
Slide 70 text
Contrastive Learning 70
Transformer
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.
N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in
Neural Information Processing Systems (pp. 5998-6008).
運⽤ self-attention,
避開 RNN 的缺點!
Contrastive Learning 88
更好的是⽤ Triplet Loss
CNN
̂
y1
̂
y2
̂
yn
CNN
̂
y1
̂
y2
̂
yn
CNN
̂
y1
̂
y2
̂
yn
越近越好
越遠越好
labeling
Slide 89
Slide 89 text
Contrastive Learning 89
更好的是⽤ Triplet Loss
F. Schroff, D. Kalenichenko, J. Philbin (Google). FaceNet: A
Unified Embedding for Face Recognition and Clustering. arXiv
preprint arXiv:1503.03832.
CNN
̂
y1
̂
y2
̂
yn
CNN
̂
y1
̂
y2
̂
yn
Positive Sample Negative Sample
Contrastive Learning 95
Self-Supervised Learning
We believe that self-supervised learning is one
of the most promising ways to build such
background knowledge and approximate a
form of common sense in AI systems.
“
—Yan LeCun (楊⽴昆)/Ishan Misra, 2021
”
Self-supervised Learning: The Dark Matter of Intelligence
https://ai.facebook.com/blog/self-supervised-
learning-the-dark-matter-of-intelligence/