data Learned Model Self-driving cars Terrain data (slope, roughness, etc.) Function mapping terrain to speed Price prediction engine Customer & market attributes and past prices Function mapping customer and market attributes to prices Gene sequence identification Lots and lots of genome data Clusters of re- occuring gene sequence patterns
User tastes User 1 likes The Clash User 23 likes Die Ärzte User 42 likes Helene Fischer User 1 likes The Sex Pistols User 42 likes Heino Rain Wind Umbrella? heavy light yes none light no light strong no light light yes none strong no Supervised Unsupervised
LEARNING PROBLEMS 1. Understand the problem and context 2. Understand & clean the data, create some features 3. For supervised learning: Split into training and test data 4. Evaluate different algorithms with default parameters 5. Optimize the parameters and compute the results 6. Interpret the results 7. Repeat with different features until you get useful results
and context 2. Understand & clean the data, create some features 3. For supervised learning: Split into training and test data 4. Evaluate different algorithms with default parameters 5. Optimize the hyperparameters and compute the results 6. Interpret the results 7. Repeat with different features until you get useful results
RECOMMENDATION) ▸ Tutorial for the “Kaggle Titanic Competition” (using R): http://trevorstephens.com/post/72916401642/titanic-getting- started-with-r ▸ Online courses (MOOCs): ▸ Udacity: Intro to Machine Learning: https://www.udacity.com/course/intro-to-machine-learning--ud120 (Excellent intro to applied ML using sci-kit learn and Python) ▸ Coursera: Machine Learning: https://www.coursera.org/learn/machine-learning (Friendly intro to the theory behind common ML algorithm) ▸ Machine Learning Mastery: Lots of self-study guides for ML learners http://machinelearningmastery.com/ ▸ UCI ML Repository: Collection of “Toy problems” for ML http://archive.ics.uci.edu/ml/datasets.html ▸ Toolkits: ▸ Scikit-Learn (Python, great online documentation): http://scikit-learn.org/stable/ ▸ stats package (many simple ML algorithms), pre-installed (R) Examples: http://www.statmethods.net/stats/ regression.html ▸ Book: Abu-Mostafa, Magdon-Ismail, Lin: Learning From Data - A Short Course (AMLbook.com ) (Good intro to more academic perspectives, notation and vocabulary on ML)