Let’s start ML on GCP
with AutoML
@sakajunquality
19.03.23 #gcpug #機械学習名古屋
Slide 2
Slide 2 text
- @sakajunquality
- Jun Sakata
- GDE, Cloud
- SRE @ Ubie, inc.
- Usually…
- #GKE #Kubernetes #containers #DevOps etc.
- Not ML Person...
Who am I?
Slide 3
Slide 3 text
Agenda
- AutoML
- Kubernetes Docs Translation with AutoML Translate
- Points for ML on GCP
Slide 4
Slide 4 text
AutoML
Slide 5
Slide 5 text
AutoML
- State-of-the-art performance
- Get up and running fast
- Generate high-quality training data
(from the official website...)
Slide 6
Slide 6 text
AutoML
https://cloud.google.com/automl/
Slide 7
Slide 7 text
AutoML
- Prepare the data
- Train
- Evaluate
- Use as API
Slide 8
Slide 8 text
AutoML
- Vision
- Natural Languages
- Translation
Slide 9
Slide 9 text
AutoML
- Vision
- Natural Languages
- Translation
Slide 10
Slide 10 text
Kubernetes Docs Translation JA
with AutoML Translation
Slide 11
Slide 11 text
About AutoML Translation
- Create “domain specific” translation model
- Over 100 languages
Slide 12
Slide 12 text
#kubernetes-docs-ja
- A community translation project of Kubernetes Docs into Japanese
- https://kubernetes.io/ja/docs/home/
- Slack
- http://slack.k8s.io/
- #kubernetes-docs-ja channel
Motivation
- Active developments and releases in Kubernetes
- Documents are also frequently updated
Slide 16
Slide 16 text
#kubernetes-docs-ja
Slide 17
Slide 17 text
Prepare the dataset
- Use the already translated Japanese and original English
Translated
sentence pairs
(en/ja)
K8s specific
translation
Model
Slide 18
Slide 18 text
Prepare the dataset
Slide 19
Slide 19 text
Prepare the dataset
- Use the already translated Japanese and original English
- Some amendments
- e.g.
- Make 1:1 pairs of sentences
- Use the same terms
Slide 20
Slide 20 text
Prepare the dataset
Slide 21
Slide 21 text
Prepare the dataset
Not enough sentences...
Slide 22
Slide 22 text
Official Document: Preparing Training Data
https://cloud.google.com/translate/automl/docs/prepare?hl=en
Slide 23
Slide 23 text
Prepare the dataset
- At least 100 sentences each for
- Train
- Validation
- Test
Slide 24
Slide 24 text
Prepare the dataset
- Not enough sentences yet in the project
- Use some sentences from GKE docs
- https://cloud.google.com/kubernetes-engine/docs/
- Topic and terms are quite similar
- With some amendments in terms
Slide 25
Slide 25 text
Prepare the dataset
Change of plan...
Kubernetes Docs
Custom
Model
GKE Docs
Slide 26
Slide 26 text
Prepare the dataset
- Export sentences pairs as TSV
- And upload into AutoML Translation
Slide 27
Slide 27 text
Crate the dataset
Chose languages...
Slide 28
Slide 28 text
Crate the dataset
Need to upload separately with few data
Slide 29
Slide 29 text
Prepare the dataset
Slide 30
Slide 30 text
Train
- Just Click “START TRAINING”
- Can use base model
- Google NMT (Default)
- https://ai.google/research/pubs/pub45610
- Other AutoML model
Slide 31
Slide 31 text
Training...
Slide 32
Slide 32 text
Wait for approximately 3 hours….
Slide 33
Slide 33 text
Prediction
- After training is finished, model can be used for prediction.
Slide 34
Slide 34 text
Prediction
Slide 35
Slide 35 text
Prediction
Looks Good !
Slide 36
Slide 36 text
Prediction
Also Looks Good
Slide 37
Slide 37 text
Prediction
Some are not quite...
Slide 38
Slide 38 text
API
The model can be used via API
Slide 39
Slide 39 text
Result
- Result is available with scores
- Refer to “Evaluating Model”
- https://cloud.google.com/translate/automl/docs/evaluate
- Need more samples?
- By default 10% of sentences are used for validation and test for each.
- More datasets for training>
- Model with datasets only from GKE is quite high in scores
- Google’s official translation is better than community one?
Considerations
Slide 43
Slide 43 text
Points for ML on GCP
Slide 44
Slide 44 text
Live Demo
(if demanded...)
Slide 45
Slide 45 text
Takeaways
- You can start ML easily with AutoML
- Creating Model
- Serving Model
- Some updates in Next SF ‘19 ?