Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scikit-Learn to "learn them all" @ EuroPython 2014

Scikit-Learn to "learn them all" @ EuroPython 2014

**Scikit-learn** is a powerful Python library, providing implementations of
many of the most popular machine learning algorithms.
This talk will provide a general overview of the main "batteries" included in
Scikit-learn, along with working code examples and comparisons with other
existing machine learning tools, in order to get the best from this wonderful
package for our machine learning code.

Valerio Maggio

July 24, 2014
Tweet

More Decks by Valerio Maggio

Other Decks in Programming

Transcript

  1. SCIKIT-LEARN “to learn them all” MACHINE L E A R

    N I N G Speaker Valerio Maggio [email protected] @leriomaggio +ValerioMaggio
  2. PLEASE ANSWER TO FIVE QUESTIONS THREE QUESTIONS • Do you

    already know what Machine Learning is? • Have you ever used 
 Scikit-Learn? • How many of you also attend the Scikit-Learn training?
  3. WHAT IS MACHINE LEARNING ? Machine learning teaches machines how

    to carry out tasks by themselves. It is that simple. The complexity comes with the details. W. Richert & L.P. Coelho, 2013 Building Machine Learning Systems with Python
  4. ML & DATA ANALYSIS • Data Analysis • Data Mining

    • Big Data • Data Science BUZZ W O R D S
  5. DATA SCIENCE DATA S C I E N C E

    Data science is the study of the generalizable extraction of knowledge from data Drew Conway 
 Data Science 
 Venn Diagram
  6. THE ESSENCE OF MACHINE LEARNING • A Pattern exists •

    We cannot pin it down mathematically • We have data on it. Learning by Examples
  7. PYTHON and DATA SCIENCE DATA SCIENCE P Y T H

    O N experfy.com/blog/ python-data-science/
  8. ONE PYTHON to rule them all • Python: The language

    of choice for Data Science • Displacing R & Matlab http://goo.gl/17XA5J One of the biggest benefits of doing data science in Python is added efficiency of using one programming language across different applications. T. Yarkoni, Univ. Texas DATA SCIENCE P Y T H O N
  9. ML PYTHON POWERED MACHINE LEARNING P Y T H O

    N github.com/kevincobain2000/awesome-machine-learning
  10. MACHINE LEARNING P Y T H O N Scala early

    stage + C++ PyWrap + Python Powered
  11. pip install numpy pip install scipy pip install ipython pip

    install scikit-learn + + + + pip install matplotlib INSTALLATION https://store.continuum.io/cshop/anaconda/
  12. SCIKIT L E A R N DESIGN PHILOSOPHY • Includes

    all the batteries necessary for (general purpose) 
 Machine Learning Code • Data (and Datasets) • Feature Selection, Extraction algorithms • ML Algorithms (Classification, Regression, Clustering, ….) • Evaluation functions (Cross Validation, Confusion Matrix) Algorithm Selection Philosophy: Try to keep the core as light as possible, including well-known and largely used ML methods
  13. DATA REPRESENTATION MACHI NE L E A R N I

    N G N = Number of Samples in the Data Set D = Number of Features In SCIKIT: scipy.sparse matrices
  14. IRIS DATASET Goal:
 Design an algorithm that is able to

    automatically recognize IRIS species
  15. IRIS DATASET Goal:
 Design an algorithm that is able to

    automatically recognize IRIS species
  16. CROSS VALIDATION Cross-validation is a model validation technique for assessing

    how the results of a statistical analysis will generalize to an independent data set.
  17. SCIKIT B A T T E R I E S

    LARGE SCALE out-of-the-box Clustering KMeans(n_clusters=3, copy_x=True, n_jobs=-1)
  18. IN CONCLUSIONS MACHINE LEARNING P Y T H O N

    • Scikit-learn is not the only Machine Learning library available in Python • But it is powerful, and easy-to-use. • Very efficient implementations • numpy, scipy, cython • Highly Integrated • NLTK, Scikit-Image