Let’s start ML on GCP with AutoML

Slide 1

Slide 1 text

Let’s start ML on GCP with AutoML @sakajunquality 19.03.23 #gcpug #機械学習名古屋

Slide 2

Slide 2 text

- @sakajunquality - Jun Sakata - GDE, Cloud - SRE @ Ubie, inc. - Usually… - #GKE #Kubernetes #containers #DevOps etc. - Not ML Person... Who am I?

Slide 3

Slide 3 text

Agenda - AutoML - Kubernetes Docs Translation with AutoML Translate - Points for ML on GCP

Slide 4

Slide 4 text

AutoML

Slide 5

Slide 5 text

AutoML - State-of-the-art performance - Get up and running fast - Generate high-quality training data (from the official website...)

Slide 6

Slide 6 text

AutoML https://cloud.google.com/automl/

Slide 7

Slide 7 text

AutoML - Prepare the data - Train - Evaluate - Use as API

Slide 8

Slide 8 text

AutoML - Vision - Natural Languages - Translation

Slide 9

Slide 9 text

AutoML - Vision - Natural Languages - Translation

Slide 10

Slide 10 text

Kubernetes Docs Translation JA with AutoML Translation

Slide 11

Slide 11 text

About AutoML Translation - Create “domain specific” translation model - Over 100 languages

Slide 12

Slide 12 text

#kubernetes-docs-ja - A community translation project of Kubernetes Docs into Japanese - https://kubernetes.io/ja/docs/home/ - Slack - http://slack.k8s.io/ - #kubernetes-docs-ja channel

Slide 13

Slide 13 text

#kubernetes-docs-ja https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/

Slide 14

Slide 14 text

#kubernetes-docs-ja https://kubernetes.io/ja/docs/concepts/overview/what-is-kubernetes/

Slide 15

Slide 15 text

Motivation - Active developments and releases in Kubernetes - Documents are also frequently updated

Slide 16

Slide 16 text

#kubernetes-docs-ja

Slide 17

Slide 17 text

Prepare the dataset - Use the already translated Japanese and original English Translated sentence pairs (en/ja) K8s specific translation Model

Slide 18

Slide 18 text

Prepare the dataset

Slide 19

Slide 19 text

Prepare the dataset - Use the already translated Japanese and original English - Some amendments - e.g. - Make 1:1 pairs of sentences - Use the same terms

Slide 20

Slide 20 text

Prepare the dataset

Slide 21

Slide 21 text

Prepare the dataset Not enough sentences...

Slide 22

Slide 22 text

Official Document: Preparing Training Data https://cloud.google.com/translate/automl/docs/prepare?hl=en

Slide 23

Slide 23 text

Prepare the dataset - At least 100 sentences each for - Train - Validation - Test

Slide 24

Slide 24 text

Prepare the dataset - Not enough sentences yet in the project - Use some sentences from GKE docs - https://cloud.google.com/kubernetes-engine/docs/ - Topic and terms are quite similar - With some amendments in terms

Slide 25

Slide 25 text

Prepare the dataset Change of plan... Kubernetes Docs Custom Model GKE Docs

Slide 26

Slide 26 text

Prepare the dataset - Export sentences pairs as TSV - And upload into AutoML Translation

Slide 27

Slide 27 text

Crate the dataset Chose languages...

Slide 28

Slide 28 text

Crate the dataset Need to upload separately with few data

Slide 29

Slide 29 text

Prepare the dataset

Slide 30

Slide 30 text

Train - Just Click “START TRAINING” - Can use base model - Google NMT (Default) - https://ai.google/research/pubs/pub45610 - Other AutoML model

Slide 31

Slide 31 text

Training...

Slide 32

Slide 32 text

Wait for approximately 3 hours….

Slide 33

Slide 33 text

Prediction - After training is finished, model can be used for prediction.

Slide 34

Slide 34 text

Prediction

Slide 35

Slide 35 text

Prediction Looks Good !

Slide 36

Slide 36 text

Prediction Also Looks Good

Slide 37

Slide 37 text

Prediction Some are not quite...

Slide 38

Slide 38 text

API The model can be used via API

Slide 39

Slide 39 text

Result - Result is available with scores - Refer to “Evaluating Model” - https://cloud.google.com/translate/automl/docs/evaluate

Slide 40

Slide 40 text

Interpretation - Evaluation Scores https://cloud.google.com/translate/automl/docs/evaluate

Slide 41

Slide 41 text

Training Result

Slide 42

Slide 42 text

- Need more samples? - By default 10% of sentences are used for validation and test for each. - More datasets for training> - Model with datasets only from GKE is quite high in scores - Google’s official translation is better than community one? Considerations

Slide 43

Slide 43 text

Points for ML on GCP

Slide 44

Slide 44 text

Live Demo (if demanded...)

Slide 45

Slide 45 text

Takeaways - You can start ML easily with AutoML - Creating Model - Serving Model - Some updates in Next SF ‘19 ?