$30 off During Our Annual Pro Sale. View Details »

GIDS 2014 - Learning from 2014

GIDS 2014 - Learning from 2014

Machine learning for busy SW professionals. This session is targeted at folks who are curious about machine learning and want to get a gist by looking at examples rather than dry theory. It will be a crisp presentation which takes various datasets and uses bunch of tools. Intention here is to share a way to comprehend what is involved at high level in machine learning. Since the ground is very vast this session will focus on applied usage of Machine learning with demos using Excel, R, Scikit and others. You will walk out with what it means to create a model using simple algorithms, evaluate a model. Idea here would be to simplify the topic and create enough interest so that attendees can go and follow-up on topic on their own using their favourite tool.

Govind Kanshi

April 25, 2014
Tweet

More Decks by Govind Kanshi

Other Decks in Programming

Transcript

  1. Learning from Data
    Busy Professional’s guide to machine learning
    @govindk
    http://govindkanshi.wordpress.com

    View Slide

  2. Agenda
    • What we know
    • What we do not know
    • Process
    • What to measure
    • Challenge with Model
    • Challenge with Data
    • Resources
    • Software
    • Books

    View Slide

  3. What we know
    • Reports made from data
    • KPIs made of data
    • Dashboards made of data
    • They all measure known metrics, questions

    View Slide

  4. What we do not know
    • Will this person turn delinquent in x years based on his profile
    (age/income/background…)
    • Which kind of process, machine will fail
    • Which people/things are similar to each other – find me a pattern
    • Prevent people from readmission into Hospital
    • Why - because we do not know the question and
    database/applications do not have oob functionality.

    View Slide

  5. We are already using applied ML results
    • Mails get despammed
    • Kinect recognizes our gestures
    • Facebook recognizes our photos
    • Siri/Cortana – recognize our voice commands
    • Watson used some
    • Search uses many
    • Recommendation is there in face

    View Slide

  6. So then
    • Learn from data
    • How
    • Create a model of the data
    • Test the model for error and use it

    View Slide

  7. Unsupervised
    • Clustering
    • Customer segmentation
    • Topic identification
    • Number of algorithms
    • Hierarchical (distance as measure – generally Euclidian )
    • Agglomerative ( start with n groups and start merging them)
    • Single Link (2 at time) vs divisive (start single – break it down)

    View Slide

  8. Simple way
    • Group folks on
    • Height
    • What you eat
    • Where you are from (state)
    • Next time a new person comes in – let us predict

    View Slide

  9. Demos
    • USArrests Data
    • Wine Data

    View Slide

  10. Challenges and next steps
    • How many groups/clusters
    • How many miss-groupings (Evaluation)
    • Associate Topics & after Clustering what
    • Once clusters are formed – some one can name them
    • Now run supervised methods on data to learn more

    View Slide

  11. Supervised learning
    • Given a label L for a attributes (a1,a2,a3..)
    • Learn the model which can predict the label based on attributes

    View Slide

  12. Simple way to understand Classification
    • Let us say we are labelled north indian, south indian
    • How
    • Attributes (language, food, movie language, music …)
    • Basically learning the link between
    • An observed data X and
    • A variable y usually called target or labels.

    View Slide

  13. Supervised
    • Data
    • One dataset for training which has label
    • One dataset for testing
    • Example
    • Classification (spam, order data, disease data, Kinect gesture)
    • Classification
    • binary vs. multiclass
    • Regression (sales)
    • Ranking
    • Search
    • Predictive maintenance
    • Recommendation
    • Netflix - Netflix competition = SVD

    View Slide

  14. Demos
    • Trees
    • DecisionTree – Python (show train and test, validation)
    • Decision tree – R
    • BigML (nw dependent)
    • Challenge –
    • one input every time

    View Slide

  15. Few more terms to overcome data issues
    • Bagging – (used with tree models) (bias reduction)
    • Train an ensemble of models from Bootstrap samples
    • Get a vote amongst models
    • Class predicted by majority of the model wins
    • Get an average if outputs are scores or probabilities
    • * Bootstrap – denotes different random sample of dataset
    • Boosting (variance reduction)
    • Like Bagging but penalizes & learns from misclassification
    • Challenge of assigning “weights” misclassified instances to penalize
    • Start with higher weight say 1 and keep reducing till error comes down

    View Slide

  16. Demo
    • RandomForest
    • n training data out of N, at each decision node of the tree, it randomly selects
    m input features from the total M input features (m ~ M^0.5) and learns a
    decision tree from it. Finally each tree in the forest vote for the result.
    • Evaluation
    • Loss function to margins (penalize mis-classification, reward +ve)

    View Slide

  17. Regression
    • Explain relationship betwee two variables (dependent vs
    independent)
    • Simple linear - y = W0
    + W1
    x1
    + W2
    x2
    + …
    • Estimate the weights to predict y
    • Multivariate

    View Slide

  18. Demos
    • Excel
    • SimpleLinear -R
    • RandomForest – Wine
    • Evaluate by applying loss function to residuals

    View Slide

  19. What to meaure
    • Data
    • Cross Validation
    • n-fold cross-validation
    • Leave-one-out validation
    • Hold out
    • Eod – how much data is enough, is there bias in data (only certain kind of labels)
    • Model Results
    • Contingency table(true negatives & false positive are bad )
    • ROC & AUC (coverage curve) (true positive vs false positives)
    • Precision/Recall (from search world)
    • F-measure
    • Lift (not interested in accuracy on entire dataset, want for 5%,10% of dataset)

    View Slide

  20. Is Model working right
    Predicted +ve Predicted -ve
    Actual +ve 40 15 55
    Actual -ve 5 40 45
    45 55 100
    Precision 40/45
    Recall 40/55
    F measure (Harmonic mean) 2/((1/prec) + (1/rec))
    Accuracy TPR(40) + TPN(55)/ (40+15+5+40)
    How much accuracy is enough
    Lift – How much better than random guessing
    Lift and accuracy do not have correlation

    View Slide

  21. Challenge with Model
    • Overfitting
    • Avoid Bias and have less variance
    • Use Regularization
    • L1 (Ridge)
    • L2 (Lasso)
    • If time permits show the alpha effect
    • Look for “overfitting model” , “bias and variance”

    View Slide

  22. Challenge with Data
    • Categorical, ordinal, quantitative
    • Measures – mean, median, variance, std deviation, range, shape (skewness)
    • Always observe to get “feel”/smell of data
    • Discretize/Thresholding (convert quantitative feature)
    • Missing feature(s) –
    • What do you do – median, avg
    • Data encoding
    • Create new from existing vs encode in different way

    View Slide

  23. Feature engineering
    • Feature selection
    • Intuition, testing co-relation
    • Subset (Start small and increase) based on some error function
    • Feature extraction
    • New k dimensions – as combination of older d dimensions
    • Linear
    • PCA (find the variance by projecting – explains impact of outliers)
    • LDA (supervised method for dimension redn for classification)
    • FA(Factor Analysis), Multidimensional Scaling(distance between points)
    • IsoMap (geodesic distance) and Locally Linear Embedding (LLE)

    View Slide

  24. What we could not cover
    • Mechanisms
    • Reinforcement Learning (punishment/rewards to learn better)
    • Algorithm types
    • Perceptron (back propogation, som, ..)
    • SVM
    • LDA and friends for unstructured world
    • Regression(ols,logistic,stepwise,mars)
    • Regularization (ridge/lasso)
    • Trees (GBM,c4.5, ID3…)
    • Bayesian
    • Kernel (radial)
    • Deep learning(DBN, Boltzman..)
    • Clustering (Expectation Max)
    • Recommendation
    • Probability (distributions) & Linear Algebra
    • Constraint Solving and Optimization (Solver, OpenSolver..)

    View Slide

  25. Tools
    • R
    • Scikit
    • Theano
    • Weka
    • Kmine
    • Recommender (.net….)
    • DataTau
    • BigML
    • WiseIO
    • Skytree
    • SAS/SPSS
    • YHatr

    View Slide

  26. Books
    • Bishop
    • Alpyadin
    • John Foreman
    • PyMC – Search query (Bayesian-Methods-for-Hackers)
    • Scikit –
    • jakevdp – “scikit jake 2014 tutorial”
    • Olvier – “scikit olvier grasel tutorial”
    • Recommender (http://mymedialite.net/) – Zeno Ganter

    View Slide

  27. What you will be doing
    • Data
    • Touch/feel (visualize),breathe it in
    • Cleaning, scaling/normalization
    • Selecting
    • Algorithm (chose the task)
    • Classification
    • Regression
    • Ranking (recommendation, search results)
    • Amongst
    • Evaluate Algorithm against each other & refine/calibrate
    • AUC, ROC, RMSE etc…

    View Slide

  28. If time & net permits Yhatr demo
    • Because you need to deploy,test & use the model
    • Yhatr provides good host (theirs and host your own)

    View Slide

  29. Thanks for your time
    • Please fill the evaluation form
    • See you next time

    View Slide

  30. Reference

    View Slide