predicts championship winners based on past history • Literature • Idea: text generator in the style of famous authors • Movies • Idea: classify movie review sentiments
Species 2.7 4.9 small-leaf 3.2 5.5 big-leaf 2.9 5.1 small-leaf 3.4 6.8 big-leaf features label The features and their corresponding labels are fed into an algorithm in a process called training. What happens during training? The algorithm will gradually determine the relationship between features and their corresponding labels. This relationship is called the model. Adapted from Google’s Machine Learning Problem Framing
movie? Is this the voice of a dog, a bird, a cat, or a grasshopper? Binary Is this a picture of a shirt, a book, or a food? Multi-class Formulating an ML problem Supervised Learning
Scraping Twitter data Example of applications: - Sentiment analysis (must be labeled) - Topic detection e.g. 50% of tweets: Chris Evans’ new TV series, 25% of the tweets: Avengers, 25% of the tweets: Golden Globes - Scraping news websites
• Might be time-consuming, especially if you need a lot of data • Need to ensure that the way your data is collected suits the real-world condition - Example: building a bird audio dataset by recording sounds of different birds
statement of your project (e.g. “Our problem is best framed as 3-class classification, which predicts whether a video will be in one of three classes—{very popular, somewhat popular, not popular}—28 days after being uploaded”) - What data you need to process (text? Images?) - Whether you need labeled data or not - Possible algorithms for your project Adapted from Google’s Machine Learning Problem Framing
way to learn machine learning • ML projects don’t have to be super fancy or complicated, they just have to be something you enjoy :) • The process of formulating your ML problem can help you figure out your next steps (e.g. collect what data, use what tools, possible algorithms) • There are lots of resources & tools out there to help you!