Slide 1

Slide 1 text

Understanding All Data Jargon By Adityo Sanjaya

Slide 2

Slide 2 text

About Us

Slide 3

Slide 3 text

PACMANN AI is a research startup focusing on the application and development of machine learning algorithms. We have implemented several machine learning projects in different fields in Indonesia. About Us Recent Projects ● Crop Disease Prediction In March 2017, we built a machine learning algorithm to detect crops disease utilizing image recognition.

Slide 4

Slide 4 text

About Us Recent Projects ● News Monitoring System During the last quarter of 2017, we made a media monitoring and information extraction tool based on Natural Language Processing to identify sentiment towards specific topics. ● Credit Scoring We made a Credit Scoring model based on Machine Learning Algorithm to minimize credit default and systematic risk for PT Permodalan Nasional Madani (PT PNM)

Slide 5

Slide 5 text

Recent Projects ● Logistics Optimization In 2019-2020 we worked for one of Sinarmas’ startup, Bizzy Indonesia. We have built several services to optimize their core logistics business. We built Vehicle Routing optimization “Truck Way”, Salesman optimization “Field Force”, Credit Scoring system and Recommendation system “Tokosmart”, Product-Toko visual recognition and build internal Machine Learning platform. About Us

Slide 6

Slide 6 text

He is the CTO of ML startup, Pacmann ai. He was a Senior Data Engineer at Bizzy. Relevant experiences: Build marketing platform for Sampoerna, Qubicle. Worked as a developer for Mivo, Broadcast Media TV. Build decision optimization platform for Bizzy, Truck Routing. Currently, the CEO of Pacmann ai. He is an ex Research ML Scientist at Bizzy. Relevant experiences: Build Recommender System for Bizzy TokoSmart. Build Face Recognition, Person Detection, Age and Gender prediction for TokoSmart using Computer Vision. ADITYO SANJAYA RIYAD RIVANDI BADARUDDIN R MOTIK He is the COO of ML startup, Pacmann ai. He was an Independent Consultant, and had assist in creating Digital Media Solution for SME’s Relevant experiences: Co-Founder of Kitabisa.com. Work as an Ads Content Specialist at Google via Adecco.

Slide 7

Slide 7 text

Contents 1. What is Data? 2. What is Statistics? 3. What is Machine Learning? 4. What is Artificial Intelligence? 5. What is Data Sciences? 6. Data Science Workflow 7. Do you need Statistics, ML, or Data Sciences? 8. Machine Learning Cases in Central Bank 9. Things you need to learn to apply good Statistics and Machine Learning in Central Bank

Slide 8

Slide 8 text

What is Data?

Slide 9

Slide 9 text

“Data are characteristics or information, usually numerical, that are collected through observation” -- OECD Statistical Definition Data Examples: All Numericals ● Excel Sheet Most common understanding of data is an excel sheet. Most of the time, it is generated by business processes, or surveys, or economy activities. What is Data?

Slide 10

Slide 10 text

Data Examples ● Time Series It’s just another “excel” data, but with time index. ● Sound High frequency time series data, like a stock return plot. What is Data?

Slide 11

Slide 11 text

Data Examples ● Images It’s the same as our excel image, but it has 3 level: Red, Green and Blue. What is Data? Source: https://cs231n.github.io/classification/

Slide 12

Slide 12 text

Data Examples ● Images It’s the same as our excel image, but it has 3 level: Red, Green and Blue. What is Data? Source: Bhupendra (2015)

Slide 13

Slide 13 text

Data Examples ● Videos A sequence of images with time index. What is Data? Source: Bhupendra (2015)

Slide 14

Slide 14 text

Data Examples ● Networks Data A data that show relations between entities. What is Data? Source: https://transportgeography.org/?page_id=6969

Slide 15

Slide 15 text

Source:https://gdcoder.com/nlp-transforming-tokens-into-features-tf-idf/ Data Examples ● Text Data Data that show linguistic meaning or understanding What is Data?

Slide 16

Slide 16 text

Growth of Data and BigData

Slide 17

Slide 17 text

Data is growing from time to time, because our activities are increasing, cheap cost of storage, and internet boom. D ata per minute, 2012 Growth of Data

Slide 18

Slide 18 text

Growth of Data Large amount of data generated from: ● Webpages (content, graph) ● Clicks (ad, page, social) ● Users (OpenID, FB Connect) ● e-mails (Hotmail, Y!Mail, Gmail) ● Photos, Movies (Flickr, YouTube, Vimeo ...) ● Installed apps (Android market etc.) ● Location (Latitude, Loopt, Foursquared) ● User generated content (Wikipedia & co) ● Ads (display, text, DoubleClick, Yahoo) ● Comments (Disqus, Facebook) ● Reviews (Yelp, Y!Local) ● Third party features (e.g. Experian) ● Social connections (LinkedIn, Facebook) ● Purchase decisions (Netflix, Amazon) Data per minute, 2014

Slide 19

Slide 19 text

Growth of Data Data from industries

Slide 20

Slide 20 text

Growth of Data You need servers for all this BigData

Slide 21

Slide 21 text

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. - 10 terabytes of data ++ Growth of Data

Slide 22

Slide 22 text

MapReduce is an algorithm to efficiently store more than 10 terabytes of data. Growth of Data Source: https://www.todaysoftmag.com/images/articles/tsm33/large/a11.png

Slide 23

Slide 23 text

Small Data

Slide 24

Slide 24 text

But, do you need BigData and MapReduce algorithm? Small Data

Slide 25

Slide 25 text

Your Data

Slide 26

Slide 26 text

But, do you need BigData and MapReduce algorithm? Small Data

Slide 27

Slide 27 text

Big RAM is eating big data – Size of datasets used for analytics 1. Is your data bigger than 10 terabytes? No? Then don’t use BigData 2. Is your data smaller than 16 GB? Yes? Just use your laptop. 3. You can’t fit your data into your RAM? Yes? Buy more RAM, it’s cheaper Small Data

Slide 28

Slide 28 text

Small Data We use Statistics in Small Data, just like your daily cases and research problems. Next, we will discuss about Statistics.

Slide 29

Slide 29 text

What is Statistics

Slide 30

Slide 30 text

Statistics is a method to infer unknown parameters from samples, to approximate the unknown parameters at population level What is Statistics? Population Samples

Slide 31

Slide 31 text

Statistics Examples ● Approximate Number of Fish Random sampling to infer number of fish in the lake. If the fish under or overpopulated the lake, it would distort ecological equilibrium What is Statistics?

Slide 32

Slide 32 text

Statistics Examples ● Approximate Number of Fish Mark and capture as a standard method to approximate population of fish. What is Statistics?

Slide 33

Slide 33 text

Statistics Examples ● Approximate Marginal Propensity to Consume (MPC) Use linear regression to estimate MPC as indicator of economy healthiness. Low MPC, i.e higher MPS, might indicate uncertainties in the future. What is Statistics?

Slide 34

Slide 34 text

Statistics Examples ● Approximate Growth Rate of Covid19 Use Bayesian simulation to infer covid19 growth rate and the possibility of pandemic disease. What is Statistics?

Slide 35

Slide 35 text

What is Statistical Bias?

Slide 36

Slide 36 text

Definition What is Statistical Bias? Source: https://mathigon.org/course/intro-statistics/point-estimation

Slide 37

Slide 37 text

Examples What is Statistical Bias? An estimator of theta has high or low bias depending on whether its mean is far from or close to theta. It has high or low variance depending on whether its mass is spread out or concentrated.

Slide 38

Slide 38 text

Problems - High bias minimize our capability to know the truth. - High bias make our parameter different from the population. - High bias make us draw a wrong conclusion. What is Statistical Bias?

Slide 39

Slide 39 text

What is Causality?

Slide 40

Slide 40 text

Spurious Correlation Correlation does not imply causation, but if two variables correlated, there might be a common factor. What is Causality?

Slide 41

Slide 41 text

Spurious Correlation Correlation does not imply causation, but if two variables correlated, there might be a common factor. What is Causality?

Slide 42

Slide 42 text

Causality ● We say that X causes Y if… ● were we to intervene and change the value of X without changing anything else… ● then Y would also change as a result What is Causality?

Slide 43

Slide 43 text

Case ● Alcohol consumption correlated with lung cancer. Is it a causal relationship? What is Causality? Smoking Lung Cancer Drink alcohol

Slide 44

Slide 44 text

Case ● Alcohol consumption correlated with lung cancer. Is it a causal relationship? What is Causality? Smoking Lung Cancer Drink alcohol Block

Slide 45

Slide 45 text

What is Machine Learning?

Slide 46

Slide 46 text

What is Machine Learning? Machine Learning “Field of study that gives computers the ability to learn without being explicitly programmed” ▪ Arthur Samuel (1959) Machine learning focus on accuracy, it doesn’t focus on inference or causality.

Slide 47

Slide 47 text

What is Machine Learning? Visit: http://vision.stanford.edu/teaching/cs231n-demos/knn/ Examples KNN Find closest neighbours, vote or average according to the closest neighbours. It detect patterns in the data

Slide 48

Slide 48 text

What is Machine Learning? Examples Deep Learning Is a subset of machine learning algorithm, it’s a neural networks with a new name.

Slide 49

Slide 49 text

How to improve accuracy?

Slide 50

Slide 50 text

How to Improve Accuracy? We can improve our model accuracy by add more bias, in order to minimize variance, or vice versa.

Slide 51

Slide 51 text

How to Improve Accuracy? We can improve our model accuracy by add more bias, in order to minimize variance, or vice versa.

Slide 52

Slide 52 text

Machine Learning ● Machine Learning focus on accuracy, a pattern recognition. ● Machine Learning can’t infer unknown parameters ● Machine Learning can’t detect causality ● It is not that smart

Slide 53

Slide 53 text

What is Artificial Intelligence?

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

Can a machine ‘Think’? Imitation Game, 2016

Slide 56

Slide 56 text

"Can machines think?"... The new form of the problem can be described in terms of a game which we call the 'imitation game. Alan Turing

Slide 57

Slide 57 text

“Can machines think?"... The new form of the problem can be described in terms of a game which we call the 'imitation game." It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart front the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either "X is A and Y is B" or "X is B and Y is A." The interrogator is allowed to put questions to A and B... We now ask the question, "What will happen when a machine takes the part of A in this game?" Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, "Can machines think?”

Slide 58

Slide 58 text

What is Artificial Intelligence? Goal: Imitate human intelligence. How: Well we don’t know how, but academia seems to have a hypothesis that Machine Learning/Pattern Recognition is the solution of AI problems.

Slide 59

Slide 59 text

What is Artificial Shallow Intelligence?

Slide 60

Slide 60 text

What is Artificial Shallow Intelligence? Goal: Imitate human intelligence. Current States: ● Only can do pattern recognition. ● Can not think ● Does not infer causality ● “It is a shallow AI” -- Andrew Ng

Slide 61

Slide 61 text

What is Data Science?

Slide 62

Slide 62 text

The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it's going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data.” – Hal Varian, Google Chief of Economist What is Data Science?

Slide 63

Slide 63 text

What is Data Science? Nate Silver

Slide 64

Slide 64 text

What is Data Science? “Nate Silver won the election” -- Harvard Business Review

Slide 65

Slide 65 text

What is Data Science?

Slide 66

Slide 66 text

What is Data Science?

Slide 67

Slide 67 text

What is Data Science? Source: Harvard 109 Data science course

Slide 68

Slide 68 text

“A data scientist is someone who knows more statistics than a computer scientist and more computer science than a statistician.” - Josh Blumenstock What is Data Science? “Data Scientist = statistician + programmer + storyteller + artist” - Shlomo Aragmon + ● Machine Learning ● Subject Matter Expertise

Slide 69

Slide 69 text

How to do Data Sciences?

Slide 70

Slide 70 text

Source: Data Science 109, Harvard How to do Data Sciences?

Slide 71

Slide 71 text

Source: Data Science 109, Harvard DEMO

Slide 72

Slide 72 text

Do you need Statistics, Machine Learning or Data Sciences?

Slide 73

Slide 73 text

What do you need? Statistics Machine Learning Data Sciences Focus on parameter inference Focus on accuracy We do ML and Stats You need a valid conclusion You need accurate prediction We make separate model for inference and prediction Small data and noisy Unstructured data We do both noisy data and unstructured data Need subject matter expertise Does not need subject matter expertise Need subject matter expertise One time run only Predict rapidly Predict rapidly and time to time inference

Slide 74

Slide 74 text

Data Science Cases in Central Bank

Slide 75

Slide 75 text

“One approach which could make this process more efficient, but also more accurate, is to train a machine learning model on a set of validated supervisory alerts which indicate the need for closer scrutiny of a particular firm.” Our first case study for supervised learning is the prediction of alerts associated with balance sheet items of financial institutions which could be reason for concerns.” Source: Machine Learning at Bank of England https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2017/machine-lear ning-at-central-banks.pdf?la=en&hash=EF5C4AC6E7D7BDC1D68A4BD865EEF3D7EE 5D7806 “Regular close scrutiny of banks’ balance sheets has become a standard for financial supervisors following the financial crises. However, the manual inspection of hundreds or thousands of firms records’ can be inefficient. Most firms will be sound and spotting complex relations between items for firms which are not, can be difficult. “ Data Science Cases

Slide 76

Slide 76 text

Source: Machine Learning at Bank of England https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2017/machine-learning-at-central-banks.pdf?la= ● Build model which can predict the Left or Right position of distribution from some variables. ● Use it as an alert system. Data Science Cases

Slide 77

Slide 77 text

Source: Machine Learning at Bank of England https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2017/machine-learning-at-central-banks.pdf?la= Data Science Cases ● Predict inflation rate with machine learning ● Use it as an alert system.

Slide 78

Slide 78 text

Source: Machine Learning at Bank of England https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2017/machine-learning-at-central-banks.pdf?la= Data Science Cases

Slide 79

Slide 79 text

Source: Machine Learning at Bank of England https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2017/machine-learning-at-central-banks.pdf?la= Data Science Cases

Slide 80

Slide 80 text

Things you need to learn to apply ML and Statistics in Central Bank

Slide 81

Slide 81 text

You need to learn: Python

Slide 82

Slide 82 text

● Pandas Functionality You need to learn: Data Wrangling

Slide 83

Slide 83 text

You need to learn: Data Wrangling ● Data Cleansing Cases

Slide 84

Slide 84 text

You need to learn: Visualization ● Exploratory Data Analysis https://www.sciencedirect.com/science/article/pii/S2468502X18300561

Slide 85

Slide 85 text

You need to learn: Statistics Source: Alex Smola 701 ML Intro course

Slide 86

Slide 86 text

You need to learn: Machine Learning

Slide 87

Slide 87 text

Our Training

Slide 88

Slide 88 text

Our Training Pacmann AI have made several public and corporate trainings in the past. Our focus is to teach a good practice of Statistics, Machine Learning, Optimization and Algorithms in industries. Last training ● 500 ++ alumnis ○ 410 alumnis with bachelor degrees ○ 80 alumnis with master degrees ○ 30 alumnis with doctoral degrees ● 50 ++ institutions

Slide 89

Slide 89 text

Our Training

Slide 90

Slide 90 text

Our Training

Slide 91

Slide 91 text

Current Public Trainings

Slide 92

Slide 92 text

No content

Slide 93

Slide 93 text

No content

Slide 94

Slide 94 text

No content

Slide 95

Slide 95 text

Current Public Training Generate your own curriculum

Slide 96

Slide 96 text

300 Students For 2 months

Slide 97

Slide 97 text

“Kurikulum yang disajikan sangat komprehensif, mencakup basic skill sampai yang sangat advance, yang bahkan tidak dipelajari secara umum di bangku Universitas.” -- Bimandra Djaafara, Researcher, Eijkman-Oxford Clinical Research Unit. -- PhD Student at Imperial College London

Slide 98

Slide 98 text

“Salah satu kelas machine learning terbaik yang pernah saya hadiri, bahkan lebih bagus jauh daripada kelas machine learning kampus saya (NUS, Computer Engineering Department).” -- Prasetya Dwicahya -- Analyst, World Bank

Slide 99

Slide 99 text

bit.ly/brosurpacmannai

Slide 100

Slide 100 text

Adityo Sanjaya adit@pacmannai.com Thank You