Introduction to Machine Learning

Machine learning Dr. Karol Przystalski

What is machine learning? "A computer program is said to
learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." — Tom M. Mitchell "Machine learning is the training of a model from data that generalizes a decision against a performance measure." — Jason Brownlee Source: Practical Machine Learning, S. Gollapudi, Packt 2016 "A branch of artificial intelligence in which a computer generates rules underlying or based on raw data that has been fed into it." — Dictionary.com "Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases." — Wikipedia

Short history 1950 Bayes theorem, Markov chains, (...) 1957 Rosenblatt’s
Perceptron 1967 Nearest Neighbor 1985 Sejnowski’s NetTalk 1986 Backpropagation 1989 Reinforcement learning 1995 Random forest 1995 SVM Before 1997 Deep Blue beats Kasparov 1998 MNIST database 2006 Deep learning 2006 Netflix challenge 2010 Kaggle 2011 Watson beats Jeopardy competitors 2012 Google Xlab 2015 Stephen Hawking, Elon Musk and al. letter Source: Forbes

Taxonomy Supervised - labeled data set; method is learning based
on labeled data and assign a label when classifying Unsupervised - unlabeled data set; also known as clustering; method is trained and tested on unlabeled data sets Reinforcement learning - the method is learning on unlabeled data sets, but got penalties or reward - depends on the classification result Deep learning - a group of methods that is based on deep neural networks 1. 2. 3. 4.

The process It consist of few steps: 1. Assemble data
2. Preprocessing 3. Feature extraction 4. Feature selection 5. Training 6. Prediction 7. Validation

Daily usage Spam filter Siri Google Photos CRUSH Amazon/Netflix/Spotify/ecommerce recommendation
systems Google Search Amazon Alexa High Frequency Trading Deep Face Google Now Medical diagnostic Tesla Smart Car Handwriting recognition Paypal fraud detection Customer segmentation

Latest researches Google Xlab Microsoft Tay Facebook Deep Face Tesla
Smart Car Source: Youtube, Twitter, Digital Trends

Trends Recent Forrester research found that 58% of companies are
researching AI solutions, but only 12% are using AI solutions. 1. Bots: Howdy, Wit and more 2. Libraries: Tensorflow, OpenAI Gym and more 3. Robotics: personal, industrial and retail 4. Autonomous vehicles: cars, drones and more 5. AI for finance, health and security Source: Safari

Issues There are few issues related to AI: 1. Ethic
and privacy 2. Job stealing 3. AI become more intelligent than humans: Ex Machine, Westworld 4. Economy

Data Scientist is a developer who: ✓ knows machine learning
methods ✓ knows how to collect and manipulate data ✓ is able to get a business value from data ✓ know the tools ✓ domain knowledge ✓ nice to have: familiar with Big Data solutions Data Scientists

Data Science in practice

Data Science tools Libraries: → scikit (scikit-learn.org) → numpy (numpy.org)
→ pandas (pandas.pydata.org) → matplotlib (matplotlib.org) Tools: → Jupyter (jupyter.org) → PyCharm (jetbrains.com/pycharm) → Spark (spark.apache.org)

Data Science and Big Data 1. Data Scientists always need
data 2. Data Science is more related to scientists and big data to system administrators/dev ops 3. There are many conferences on Big Data and only few related to data science that are not scientifical 4. Having a lot of data makes it possible to use deep learning

Research lab at Codete We are developing: 1. Our own
methods/algorithms/solution in Machine Learning area 2. Building concepts 3. Research 4. Kaggle 5. Big Data solutions

Latest proof of Concepts for automotive Technology stack: → Both
3rd party systems stores information in Oracle databases → These entries are converted into Apache Kafka messages with Kafka Connect → Kafka Mirror Maker maintains replicas → Data coming from system B is stored in Apache HBase to allow retrieving car information efficiently → All the processing is done by a bunch of Apache Spark jobs → Splunk frontend

Where to go next? 1. Know at least one programming
language. Recommended: Python, Java, R, Julia, Scala 2. Do Coursera Machine Learning course (by Andrew NG) and Neural Networks course (Geoffrey Hinton) 3. Read Machine Learning: An Algorithmic Perspective, CRC 2014, Stephen Marsland 4. Learn Python libraries on www.datacamp.com 5. Take a part of one of Kaggle challenges 6. Apply the knowledge in a specific problem 7. Learn

Introduction to Machine Learning

Introduction to Machine Learning

Karol Przystalski

More Decks by Karol Przystalski

Other Decks in Technology

Featured

Transcript

Machine learning Dr. Karol Przystalski

What is machine learning? "A computer program is said to

Short history 1950 Bayes theorem, Markov chains, (...) 1957 Rosenblatt’s

Taxonomy Supervised - labeled data set; method is learning based

The process It consist of few steps: 1. Assemble data

Daily usage Spam filter Siri Google Photos CRUSH Amazon/Netflix/Spotify/ecommerce recommendation

Latest researches Google Xlab Microsoft Tay Facebook Deep Face Tesla

Trends Recent Forrester research found that 58% of companies are

Issues There are few issues related to AI: 1. Ethic

Data Scientist is a developer who: ✓ knows machine learning

Data Science in practice

Data Science tools Libraries: → scikit (scikit-learn.org) → numpy (numpy.org)

Data Science and Big Data 1. Data Scientists always need

Research lab at Codete We are developing: 1. Our own

Latest proof of Concepts for automotive Technology stack: → Both

Where to go next? 1. Know at least one programming

Q&A