BERT for Text Classification with Keras/TensorFlow 2

BERT for Text Classiﬁcation with Keras/TensorFlow 2 Galuh Sahid Data
Scientist, Gojek / ML GDE

What will we do today?

This movie is awesome! Positive

Positive Negative This movie is thrilling! Such a disappointing ending.

This movie is thrilling! Positive Model

1. Train everything from scratch 2. Use a pre-trained model
Ways to do training

A deep learning model is trained on a large dataset,
then used to perform similar tasks on another dataset (e.g. text classiﬁcation) Transfer learning

What is BERT?

BERT: Bidirectional Encoder Representations from Transformers

“...we train a general-purpose ‘language understanding’ model on a large
text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering)” https://github.com/google-research/bert

“BERT outperforms previous methods because it is the ﬁrst unsupervised,
deeply bidirectional system for pre-training NLP.” https://github.com/google-research/bert

BERT was trained using only a plain text corpus Unsupervised

• Pre-trained representations can also either be context-free or contextual
Bidirectional bank bank deposit river bank

• Contextual representations can further be unidirectional or bidirectional Bidirectional
I made a bank deposit I made a bank deposit

• Starts from the very bottom of a deep neural
network Deeply bidirectional

BERT Training Strategies

Positive Negative This movie is thrilling! Such a disappointing ending.

• Masked language model • Next sentence prediction Training strategies

Input: the man went to the [MASK1] . he bought
a [MASK2] of milk. Labels: [MASK1] = store; [MASK2] = gallon Masked language model

Sentence A: the man went to the store . Sentence
B: he bought a gallon of milk . Label: IsNextSentence Next sentence prediction

Sentence A: the man went to the store . Sentence
B: penguins are ﬂightless . Label: NotNextSentence Next sentence prediction

• https://github.com/google-research/bert References

Hands-on Practice

bit.ly/wtm-bert-colab

BERT for Text Classification with Keras/TensorF...

BERT for Text Classification with Keras/TensorFlow 2

Galuh Sahid

More Decks by Galuh Sahid

Other Decks in Technology

Featured

Transcript

BERT for Text Classiﬁcation with Keras/TensorFlow 2 Galuh Sahid Data

What will we do today?

This movie is awesome! Positive

Positive Negative This movie is thrilling! Such a disappointing ending.

This movie is thrilling! Positive Model

1. Train everything from scratch 2. Use a pre-trained model

A deep learning model is trained on a large dataset,

What is BERT?

BERT: Bidirectional Encoder Representations from Transformers

“...we train a general-purpose ‘language understanding’ model on a large

“BERT outperforms previous methods because it is the ﬁrst unsupervised,

BERT was trained using only a plain text corpus Unsupervised

• Pre-trained representations can also either be context-free or contextual

• Contextual representations can further be unidirectional or bidirectional Bidirectional

• Starts from the very bottom of a deep neural

BERT Training Strategies

Positive Negative This movie is thrilling! Such a disappointing ending.

• Masked language model • Next sentence prediction Training strategies

Input: the man went to the [MASK1] . he bought

Sentence A: the man went to the store . Sentence

Sentence A: the man went to the store . Sentence

• https://github.com/google-research/bert References

Hands-on Practice

bit.ly/wtm-bert-colab