First Steps Towards Your First Machine Learning Project

First Steps Towards Your First Machine Learning Project

Girls in ICT Brunei 2020

2b6d7bdd43058e87f53866eb86538a59?s=128

Galuh Sahid

April 26, 2020
Tweet

Transcript

  1. 1.

    First Steps Towards Your First Machine Learning Project Galuh Sahid

    (Twitter: @galuhsahid) Data Scientist at Gojek Google Developer Expert in Machine Learning
  2. 8.

    I love… • Sports • Idea: build a model that

    predicts championship winners based on past history • Literature • Idea: text generator in the style of famous authors • Movies • Idea: classify movie review sentiments
  3. 10.

    It has to be… something you enjoy It does not

    have to be… something complicated or novel
  4. 13.

    Formulating an ML problem Supervised Learning Leaf Width Leaf Length

    Species 2.7 4.9 small-leaf 3.2 5.5 big-leaf 2.9 5.1 small-leaf 3.4 6.8 big-leaf Adapted from Google’s Machine Learning Problem Framing
  5. 14.

    Formulating an ML problem Supervised Learning Leaf Width Leaf Length

    Species 2.7 4.9 small-leaf 3.2 5.5 big-leaf 2.9 5.1 small-leaf 3.4 6.8 big-leaf features label The features and their corresponding labels are fed into an algorithm in a process called training. What happens during training? The algorithm will gradually determine the relationship between features and their corresponding labels. This relationship is called the model. Adapted from Google’s Machine Learning Problem Framing
  6. 17.

    Formulating an ML problem Supervised Learning Leaf Width Leaf Length

    Species 2.7 4.9 small-leaf 3.2 5.5 big-leaf 2.9 5.1 small-leaf 3.4 6.8 big-leaf 2.1 3.4 ? small-leaf (our prediction) Adapted from Google’s Machine Learning Problem Framing
  7. 19.

    Supervised Learning Classification Is this a spam/not a spam? Is

    the sentiment of this movie review positive or negative? Binary Is this a picture of a cat or a dog? Formulating an ML problem Supervised Learning
  8. 20.

    Supervised Learning Classification Is this a comedy, horror, or drama

    movie? Is this the voice of a dog, a bird, a cat, or a grasshopper? Binary Is this a picture of a shirt, a book, or a food? Multi-class Formulating an ML problem Supervised Learning
  9. 21.

    Supervised Learning Classification Binary Multi-class Regression What is the price

    of the house if it has 2 floors, 4 bedrooms, and 2 bathrooms? What is the temperature in Paris tomorrow? Formulating an ML problem Supervised Learning
  10. 26.

    Formulating an ML problem Identifying Data Sources Use ready-to-use dataset

    Collect & build your own dataset from scratch Extract the data from existing data sources
  11. 31.

    Formulating an ML problem Ready-to-use dataset • Oftentimes data cleansing,

    manipulation, transformations are still necessary • You need to know the labels that you expect
  12. 33.

    Formulating an ML problem Extract the data by yourself -

    Scraping Twitter data Example of applications: - Sentiment analysis (must be labeled) - Topic detection e.g. 50% of tweets: Chris Evans’ new TV series, 25% of the tweets: Avengers, 25% of the tweets: Golden Globes - Scraping news websites
  13. 35.

    Formulating an ML problem Build your own dataset from scratch

    • Might be time-consuming, especially if you need a lot of data • Need to ensure that the way your data is collected suits the real-world condition - Example: building a bird audio dataset by recording sounds of different birds
  14. 36.

    Formulating an ML problem Now you know: - The problem

    statement of your project (e.g. “Our problem is best framed as 3-class classification, which predicts whether a video will be in one of three classes—{very popular, somewhat popular, not popular}—28 days after being uploaded”) - What data you need to process (text? Images?) - Whether you need labeled data or not - Possible algorithms for your project Adapted from Google’s Machine Learning Problem Framing
  15. 38.

    Tools & resources Programming languages • Python is usually the

    go-to programming language • However, you can now train your own machine learning models using JavaScript thanks to TensorFlow.js
  16. 39.

    Tools & resources Libraries • Data manipulation: numpy, pandas •

    NLP: NLTK, spaCy • Image processing: PIL, OpenCV • Machine learning: scikit-learn, TensorFlow, TensorFlow Lite
  17. 43.

    Tools & resources TensorFlow Hub Example: mobilebert, compressed version of

    BERT (Bidirectional Encoder Representations from Transformers)
  18. 45.

    Tools & resources Learning Resources • Deep Learning with Python

    (book) by François Chollet • Machine Learning Glossary • Machine Learning Crash Course • TensorFlow Tutorials • Teachable Machine Tutorials (1, 2, 3)
  19. 46.

    Takeaways • Building machine learning projects can be a great

    way to learn machine learning • ML projects don’t have to be super fancy or complicated, they just have to be something you enjoy :) • The process of formulating your ML problem can help you figure out your next steps (e.g. collect what data, use what tools, possible algorithms) • There are lots of resources & tools out there to help you!