Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Artificial Intelligence and Mac...

Introduction to Artificial Intelligence and Machine Learning

Introduction to Artificial Intelligence and Machine Learning presented at Utah Code Camp March 23, 2013

Avatar for ronny bjarnason

ronny bjarnason

March 23, 2013
Tweet

Other Decks in Technology

Transcript

  1. Intro to AI / Machine Learning ronny bjarnason principal data

    scientist red brain labs ronny@redbrainlabs.com
  2. Intro to Me education: byu (‘00, ‘02), oregon state (‘09)

    marathoner, father of five, scoutmaster principal data scientist at red brain labs
  3. red brain labs draper, ut ★ Predictive Analytics / Business

    Intelligence ★ Team: Business, Devs, Statistics, Machine Learning ★ Always Looking for Like-Minded Individuals ★ Commercial Over
  4. Why are you here? ★ Have a project in mind

    and interested in getting your hands dirty. (yes) ★ Some intuition how Machine Algorithms work (YES!) ★ Curious about available libraries. (yes) ★ Nothing else to do before lunch?
  5. ★ Driving Directions ★ Natural Language Processing (Siri) ★ Computer

    Vision (Captcha solvers) ★ Robotics ★ Internet Search ★ Recommender Systems (Netflix prize) ★ Solving Tic-Tac-Toe ★ Constraint Satisfaction (Sudoku) ★ Nest What is Artificial Intelligence? actually, tons of stuff we take for granted.
  6. What is Machine Learning? ★ Subset of Artificial Intelligence ★

    All that other stuff if it improves automatically as it gets more experience (data). ★ Unsupervised: Clustering, Reinforcement Learning ★ Supervised: Regression, Classification ★ Data Mining
  7. Data Mining ★ Example of Supervised Learning ★ Given a

    history (training data) predict an unseen variable ★ Which class does this belong to? Classification ★ What value will this be? Regression - Where I spend most of my effort
  8. Popular Languages ★ Python (SciPy, scikit-learn, Pandas) ★ R ★

    Java (weka) ★ Lisp -John McCarthy ★ SQL? August 2011
  9. Machine Learning Algorithms ★ Decision Trees ★ Neural Nets ★

    Linear and Logistic Regression ★ Naive Bayes / Bayes Nets ★ Random Forests ★ Support Vector Machines ★ Many, many more More with less data? more data? UCI Repository
  10. Prepping the Data ★ Most of the practical work with

    Machine Learning and Data Mining is getting the data right. Receive a clients data, listen, pay attention, but assume they don’t really know what data they have and where it is - because they don’t. ★ Feature Discovery - what are the important attributes ★ Data Visualization - big hints for prediction ★ Data Cleansing How many databases do you really need? ★ Data Leakage
  11. Data Leakage ★ “Time Machining” Can only use data that

    you will have at the time of prediction ‣ Predicting Stock Values ‣ Attributes added after the fact ★ What attributes have been picked by your algorithm? UserID? ★ Cross fold validation ★ Can we beat random? Does it make sense?
  12. Not Data Leakage ★ Early Kaggle winners odd/even de-anonymization outside

    data sources ★ Take every advantage you possibly can this is data mining at its finest (it only feels like cheating)
  13. Decision Trees ★ Predicting membership in a class ★ Split

    the Data on the single best attribute Repeat for each child node BigML ★ Easily interpretable ★ Scalable easily handle streaming data why might this be important?
  14. Commentary on “Big Data” ★ Hadoop/MapReduce is Fabulous ★ But

    only if you need it ★ For most of what you do, it just isn’t necessary Really, how big is a hard drive these days? ★ There are other options ★ </rant> ★ Graphlab is awesome -Ben
  15. Linear and Logistic Regression ★ Learn a weight for each

    of the features ★ Adjust the weight with each example (either individually or by batch) ★ Simple, Fast ★ Good For Regression, not so much for Classification
  16. Neural Networks ★ Minsky (perceptron, xor), Rumelhart ★ Input Nodes,

    output nodes, hidden nodes ★ Assign weights to all connecting edges (backpropagation) ★ Learn much more complex relationships ★ Difficult to interpret results ★ Typically poor for regression
  17. Random Forests ★ Kaggle regular averages results of very simple

    algorithms ★ subselect the attributes ★ build a simple decision tree ★ create n-trees ★ aggregate results
  18. Meta ★ Scaling for accuracy Platt, Isotonic ★ K-fold Models

    extra training, decrease chances of overfitting ★ Ensemble Methods Random Forests, Boosted Forests
  19. Ensemble Building ★ Can’t decide on a single algorithm? ★

    Do you have to? ★ Benefits If your data changes slightly over time If you aren’t sure of the structure of the data ★ Problems: Time Constraints ‣ Training time ‣ Initial Tuning Time ★ Example: LMP 500+ models, fight for your life, keep yesterday’s winners, add some extras randomly
  20. Monte-Carlo ★ Unknown state of the world ★ Known transitions

    ★ Known probabilities ★ Roll the dice - what happened? repeat until termination ★ Application: March Madness
  21. Monte-Carlo ★ Unknown state of the world ★ Known transitions

    ★ Known probabilities ★ Roll the dice - what happened? repeat until termination ★ Application: March Madness ★ Application: Solitaire - Persi Diaconis
  22. Where to Get Started? ★ kaggle.com data mining competitions (prize

    money) why hire a ML developer? ★ Andrew Ng’s Coursera course ★ Machine Learning for Hackers O’Reilly examples in R ★ Wikipedia
  23. Projects I’d Love to Work On ★ Predicting Race Times

    - more than just the standard race tables Attributes? Suunto Ambit predicts recovery time. ★ Cheating at Solitaire instant recognition, faster prediction ★ NICU lots of intuition going on