Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Machine Learning

Introduction to Machine Learning

Karol Przystalski

May 10, 2017
Tweet

More Decks by Karol Przystalski

Other Decks in Technology

Transcript

  1. What is machine learning? "A computer program is said to

    learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." — Tom M. Mitchell "Machine learning is the training of a model from data that generalizes a decision against a performance measure." — Jason Brownlee Source: Practical Machine Learning, S. Gollapudi, Packt 2016 "A branch of artificial intelligence in which a computer generates rules underlying or based on raw data that has been fed into it." — Dictionary.com "Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases." — Wikipedia
  2. Short history 1950 Bayes theorem, Markov chains, (...) 1957 Rosenblatt’s

    Perceptron 1967 Nearest Neighbor 1985 Sejnowski’s NetTalk 1986 Backpropagation 1989 Reinforcement learning 1995 Random forest 1995 SVM Before 1997 Deep Blue beats Kasparov 1998 MNIST database 2006 Deep learning 2006 Netflix challenge 2010 Kaggle 2011 Watson beats Jeopardy competitors 2012 Google Xlab 2015 Stephen Hawking, Elon Musk and al. letter Source: Forbes
  3. Taxonomy Supervised - labeled data set; method is learning based

    on labeled data and assign a label when classifying Unsupervised - unlabeled data set; also known as clustering; method is trained and tested on unlabeled data sets Reinforcement learning - the method is learning on unlabeled data sets, but got penalties or reward - depends on the classification result Deep learning - a group of methods that is based on deep neural networks 1. 2. 3. 4.
  4. The process It consist of few steps: 1. Assemble data

    2. Preprocessing 3. Feature extraction 4. Feature selection 5. Training 6. Prediction 7. Validation
  5. Daily usage Spam filter Siri Google Photos CRUSH Amazon/Netflix/Spotify/ecommerce recommendation

    systems Google Search Amazon Alexa High Frequency Trading Deep Face Google Now Medical diagnostic Tesla Smart Car Handwriting recognition Paypal fraud detection Customer segmentation
  6. Latest researches Google Xlab Microsoft Tay Facebook Deep Face Tesla

    Smart Car Source: Youtube, Twitter, Digital Trends
  7. Trends Recent Forrester research found that 58% of companies are

    researching AI solutions, but only 12% are using AI solutions. 1. Bots: Howdy, Wit and more 2. Libraries: Tensorflow, OpenAI Gym and more 3. Robotics: personal, industrial and retail 4. Autonomous vehicles: cars, drones and more 5. AI for finance, health and security Source: Safari
  8. Issues There are few issues related to AI: 1. Ethic

    and privacy 2. Job stealing 3. AI become more intelligent than humans: Ex Machine, Westworld 4. Economy
  9. Data Scientist is a developer who: ✓ knows machine learning

    methods ✓ knows how to collect and manipulate data ✓ is able to get a business value from data ✓ know the tools ✓ domain knowledge ✓ nice to have: familiar with Big Data solutions Data Scientists
  10. Data Science tools Libraries: → scikit (scikit-learn.org) → numpy (numpy.org)

    → pandas (pandas.pydata.org) → matplotlib (matplotlib.org) Tools: → Jupyter (jupyter.org) → PyCharm (jetbrains.com/pycharm) → Spark (spark.apache.org)
  11. Data Science and Big Data 1. Data Scientists always need

    data 2. Data Science is more related to scientists and big data to system administrators/dev ops 3. There are many conferences on Big Data and only few related to data science that are not scientifical 4. Having a lot of data makes it possible to use deep learning
  12. Research lab at Codete We are developing: 1. Our own

    methods/algorithms/solution in Machine Learning area 2. Building concepts 3. Research 4. Kaggle 5. Big Data solutions
  13. Latest proof of Concepts for automotive Technology stack: → Both

    3rd party systems stores information in Oracle databases → These entries are converted into Apache Kafka messages with Kafka Connect → Kafka Mirror Maker maintains replicas → Data coming from system B is stored in Apache HBase to allow retrieving car information efficiently → All the processing is done by a bunch of Apache Spark jobs → Splunk frontend
  14. Where to go next? 1. Know at least one programming

    language. Recommended: Python, Java, R, Julia, Scala 2. Do Coursera Machine Learning course (by Andrew NG) and Neural Networks course (Geoffrey Hinton) 3. Read Machine Learning: An Algorithmic Perspective, CRC 2014, Stephen Marsland 4. Learn Python libraries on www.datacamp.com 5. Take a part of one of Kaggle challenges 6. Apply the knowledge in a specific problem 7. Learn
  15. Q&A