Introduction to Artificial Intelligence and Machine Learning

Intro to AI / Machine Learning ronny bjarnason principal data
scientist red brain labs ronny@redbrainlabs.com

Intro to Me education: byu (‘00, ‘02), oregon state (‘09)
marathoner, father of five, scoutmaster principal data scientist at red brain labs

red brain labs draper, ut ★ Predictive Analytics / Business
Intelligence ★ Team: Business, Devs, Statistics, Machine Learning ★ Always Looking for Like-Minded Individuals ★ Commercial Over

Why are you here? ★ Have a project in mind
and interested in getting your hands dirty. (yes) ★ Some intuition how Machine Algorithms work (YES!) ★ Curious about available libraries. (yes) ★ Nothing else to do before lunch?

★ Driving Directions ★ Natural Language Processing (Siri) ★ Computer
Vision (Captcha solvers) ★ Robotics ★ Internet Search ★ Recommender Systems (Netﬂix prize) ★ Solving Tic-Tac-Toe ★ Constraint Satisfaction (Sudoku) ★ Nest What is Artiﬁcial Intelligence? actually, tons of stuff we take for granted.

What is Machine Learning? ★ Subset of Artificial Intelligence ★
All that other stuff if it improves automatically as it gets more experience (data). ★ Unsupervised: Clustering, Reinforcement Learning ★ Supervised: Regression, Classification ★ Data Mining

Data Mining ★ Example of Supervised Learning ★ Given a
history (training data) predict an unseen variable ★ Which class does this belong to? Classiﬁcation ★ What value will this be? Regression - Where I spend most of my eﬀort

Popular Languages ★ Python (SciPy, scikit-learn, Pandas) ★ R ★
Java (weka) ★ Lisp -John McCarthy ★ SQL? August 2011

Machine Learning Algorithms ★ Decision Trees ★ Neural Nets ★
Linear and Logistic Regression ★ Naive Bayes / Bayes Nets ★ Random Forests ★ Support Vector Machines ★ Many, many more More with less data? more data? UCI Repository

Prepping the Data ★ Most of the practical work with
Machine Learning and Data Mining is getting the data right. Receive a clients data, listen, pay attention, but assume they don’t really know what data they have and where it is - because they don’t. ★ Feature Discovery - what are the important attributes ★ Data Visualization - big hints for prediction ★ Data Cleansing How many databases do you really need? ★ Data Leakage

Data Leakage ★ “Time Machining” Can only use data that
you will have at the time of prediction ‣ Predicting Stock Values ‣ Attributes added after the fact ★ What attributes have been picked by your algorithm? UserID? ★ Cross fold validation ★ Can we beat random? Does it make sense?

Not Data Leakage ★ Early Kaggle winners odd/even de-anonymization outside
data sources ★ Take every advantage you possibly can this is data mining at its finest (it only feels like cheating)

Decision Trees ★ Predicting membership in a class ★ Split
the Data on the single best attribute Repeat for each child node BigML ★ Easily interpretable ★ Scalable easily handle streaming data why might this be important?

Commentary on “Big Data” ★ Hadoop/MapReduce is Fabulous ★ But
only if you need it ★ For most of what you do, it just isn’t necessary Really, how big is a hard drive these days? ★ There are other options ★ </rant> ★ Graphlab is awesome -Ben

Linear and Logistic Regression ★ Learn a weight for each
of the features ★ Adjust the weight with each example (either individually or by batch) ★ Simple, Fast ★ Good For Regression, not so much for Classiﬁcation

Neural Networks ★ Minsky (perceptron, xor), Rumelhart ★ Input Nodes,
output nodes, hidden nodes ★ Assign weights to all connecting edges (backpropagation) ★ Learn much more complex relationships ★ Diﬃcult to interpret results ★ Typically poor for regression

Random Forests ★ Kaggle regular averages results of very simple
algorithms ★ subselect the attributes ★ build a simple decision tree ★ create n-trees ★ aggregate results

Meta ★ Scaling for accuracy Platt, Isotonic ★ K-fold Models
extra training, decrease chances of overﬁtting ★ Ensemble Methods Random Forests, Boosted Forests

Ensemble Building ★ Can’t decide on a single algorithm? ★
Do you have to? ★ Beneﬁts If your data changes slightly over time If you aren’t sure of the structure of the data ★ Problems: Time Constraints ‣ Training time ‣ Initial Tuning Time ★ Example: LMP 500+ models, ﬁght for your life, keep yesterday’s winners, add some extras randomly

Monte-Carlo ★ Unknown state of the world ★ Known transitions
★ Known probabilities ★ Roll the dice - what happened? repeat until termination ★ Application: March Madness

Monte-Carlo ★ Unknown state of the world ★ Known transitions
★ Known probabilities ★ Roll the dice - what happened? repeat until termination ★ Application: March Madness ★ Application: Solitaire - Persi Diaconis

Where to Get Started? ★ kaggle.com data mining competitions (prize
money) why hire a ML developer? ★ Andrew Ng’s Coursera course ★ Machine Learning for Hackers O’Reilly examples in R ★ Wikipedia

Projects I’d Love to Work On ★ Predicting Race Times
- more than just the standard race tables Attributes? Suunto Ambit predicts recovery time. ★ Cheating at Solitaire instant recognition, faster prediction ★ NICU lots of intuition going on

Questions?

Thank You.

Introduction to Artificial Intelligence and Mac...

Introduction to Artificial Intelligence and Machine Learning

ronny bjarnason

Other Decks in Technology

Featured

Transcript

Intro to AI / Machine Learning ronny bjarnason principal data

Intro to Me education: byu (‘00, ‘02), oregon state (‘09)

red brain labs draper, ut ★ Predictive Analytics / Business

Why are you here? ★ Have a project in mind

★ Driving Directions ★ Natural Language Processing (Siri) ★ Computer

What is Machine Learning? ★ Subset of Artiﬁcial Intelligence ★

Data Mining ★ Example of Supervised Learning ★ Given a

Popular Languages ★ Python (SciPy, scikit-learn, Pandas) ★ R ★

Machine Learning Algorithms ★ Decision Trees ★ Neural Nets ★

Prepping the Data ★ Most of the practical work with

Data Leakage ★ “Time Machining” Can only use data that

Not Data Leakage ★ Early Kaggle winners odd/even de-anonymization outside

Decision Trees ★ Predicting membership in a class ★ Split

Commentary on “Big Data” ★ Hadoop/MapReduce is Fabulous ★ But

Linear and Logistic Regression ★ Learn a weight for each

Neural Networks ★ Minsky (perceptron, xor), Rumelhart ★ Input Nodes,

Random Forests ★ Kaggle regular averages results of very simple

Meta ★ Scaling for accuracy Platt, Isotonic ★ K-fold Models

Ensemble Building ★ Can’t decide on a single algorithm? ★

Monte-Carlo ★ Unknown state of the world ★ Known transitions

Monte-Carlo ★ Unknown state of the world ★ Known transitions

Where to Get Started? ★ kaggle.com data mining competitions (prize

Projects I’d Love to Work On ★ Predicting Race Times

Questions?

Thank You.