Marcos Arancibia
July 21, 2020

# Machine Learning 101: Regression

Have you always been curious about what machine learning can do for your business problem, but could never find the time to learn the practical necessary skills? Do you wish to learn what Classification, Regression, Clustering and Feature Extraction techniques do, and how to apply them using the Oracle Machine Learning family of products? Join us for this third chapter of the series “Oracle Machine Learning Office Hours – Machine Learning 101”.

In this "ML Regression 101" we learned how to set up a data set for regression modeling, build machine learning models that predict numeric values such as home prices, evaluate model quality and compare algorithms, as well as use AutoML for Regression.

July 21, 2020

## Transcript

1. ### With Marcos Arancibia, Product Manager, Data Science and Big Data

@MarcosArancibia Mark Hornick, Senior Director, Product Management, Data Science and Machine Learning @MarkHornick oracle.com/machine-learning Oracle Machine Learning Office Hours Machine Learning 101 – Regression Copyright © 2020, Oracle and/or its affiliates. All rights reserved

affiliates.

6. ### Today’s Session: Machine Learning 101 - Regression In this "ML

Regression 101" we will learn how to set up a data set for regression modeling, build machine learning models that predict numeric values such as home prices, evaluate model quality and compare algorithms, as well as use AutoML for Regression. Copyright © 2020, Oracle and/or its affiliates. All rights reserved
7. ### • What is machine learning? • What is Regression? •

History of Regression • Types of data needed for Regression • Terminology • Data Preparation • Linear Regression Intuition • Other Regression Algorithms • Model evaluation • AutoML • Q&A Agenda Copyright © 2020 Oracle and/or its affiliates 7
8. ### Regression is a subcategory of Supervised Learning (Machine Learning with

a known past outcome) where the goal is to predict a continuous-valued outcome for new records based on past observations. Examples of Regression: • Compute "credit scoring" or a new "credit limit" of a person, based on income, past payments behavior, credit limits, late payments, debt-to-income and debt-to-credit-limit ratios, housing situation, etc. • Predicting "future stock prices" based on past behavior and the relationship to the rest of the market and other external economic influences • Predict the "value of a house" based on location, number of rooms, lot size, neighborhood crime levels, schools, etc. • Estimate the "arrival delay" of a Flight based on the departure delay, carrier, day of the week, day of the month, origin, destination, and other metrics Machine learning can be applied to a wide range of business problems What is Regression? Copyright © 2020 Oracle and/or its affiliates.
9. ### A regression task begins with a data set in which

the target values are known. For example, a regression model that predicts house values can be developed based on observed data for many houses over a period of time. In the model build (training) process, a regression algorithm estimates the value of the target as a function of the predictors for each case in the build data. These relationships between predictors and target are summarized in a model, which can then be applied to a different data set in which the target values are unknown. Regression models are tested by computing various statistics that measure the difference between the predicted values and the expected values. The historical data for a regression project is typically divided into two data sets: one for building the model, the other for testing the model. Machine learning can be applied to a wide range of business problems What is Regression? Copyright © 2020 Oracle and/or its affiliates.
10. ### 10 History of Regression Francis Galton (half-cousin of Charles Darwin)

was the first to describe and explain the common phenomenon of Regression toward the mean, which he first observed in his experiments on the size of the seeds of successive generations of sweet peas in 1875. Source: http://en.wikipedia.org/wiki/Francis_Galton
11. ### One of the first Regression models ever created was a

Credit Scoring model built in 1963. Machine storage and memory limitations restricted the data to 600 customers and 25 attributes. First Regression model on a computer system History of Regression Copyright © 2020 Oracle and/or its affiliates. IBM Mainframes 7090 and 7094 from early 60's. Pictures from Wikimedia The Development of Numerical Credit Evaluation Systems. James H. Myers; Edward W. Forgy. Journal of the American Statistical Association, Volume 58, Issue 303 (Sep., 1963)
12. ### Copyright © 2020, Oracle and/or its affiliates 12 Arrival Delays

of Domestic Flights • The ONTIME dataset contains scheduled and actual departure and arrival times, reason of delay and other measurements reported by certified U.S. air carriers that account for at least one percent of domestic scheduled passenger revenues. • The data is collected by the Office of Airline Information, Bureau of Transportation Statistics (BTS). • A version of this dataset that became famous consists of flight arrival and departure details from October 1987 to April 2008 (123mi records) presented by the American Statistical Association Data Expo in 2009, and the data is still hosted at http://stat-computing.org/dataexpo/2009/ • Newer data can be found on the BTS site at https://www.bts.gov/browse-statistical-products-and- data/bts-publications/airline-service-quality-performance-234-time Data that will be used for the current Session Data for Regression
13. ### To predict the "arrival delay" of a flight, based on

other information about the flight: “Garbage in, garbage out” is especially true for ML, but also having the right data What type of data is needed for a Regression problem? Copyright © 2020 Oracle and/or its affiliates. Historical data with known outcomes ID CARR DEPDELAY ORIGIN DEST … DISTANCE ARRDELAY 633 AA -1 mins SJU DFW … 2,195 0 mins 1184 UA 18.5 mins JAX IAD … 631 20 mins 86 NW 8 mins HNL SEA … 2,677 15 mins … … … …. … … … …
14. ### To predict the "arrival delay" of a flight, based on

other information about the flight: “Garbage in, garbage out” is especially true for ML, but also having the right data What type of data is needed for a Regression problem? Copyright © 2020 Oracle and/or its affiliates. Historical data with known outcomes ID CARR DEPDELAY ORIGIN DEST … DISTANCE ARRDELAY 633 AA -1 mins SJU DFW … 2,195 0 mins 1184 UA 18.5 mins JAX IAD … 631 20 mins 86 NW 8 mins HNL SEA … 2,677 15 mins … … … …. … … … … New data unknown outcomes – predict arrival delay ID CARR DEPDELAY ORIGIN DEST … DISTANCE PREDICTED ARRDELAY 345 AA -1 mins SJU DFW … 2,195 0.5 mins 1235 UA 18.5 mins JAX IAD … 631 23 mins 342 NW 8 mins HNL SEA … 2,677 13.5 mins … … … …. … … … …
15. ### Several names are used for the same components, depending on

the field of study Machine Learning terminology Copyright © 2020 Oracle and/or its affiliates. Historical data with known outcomes Table Row • Record • Case • Instance • Example Table Columns • Variable • Attribute • Field • Predictor Table Column • Target – what to predict • Response Table Column • Case ID • Unique ID ID CARR DEPDELAY ORIGIN DEST … DISTANCE ARRDELAY 345 AA -1 mins SJU DFW … 2,195 0 mins 1235 UA 18.5 mins JAX IAD … 631 20 mins 342 NW 8 mins HNL SEA … 2,677 15 mins … … … …. … … … … Data • Database Table or View • Data set (or dataset) • Training data – to build a model • Test data – to evaluate a model
16. ### Several names are used for the same components, depending on

the field of study Machine Learning terminology Copyright © 2020 Oracle and/or its affiliates. Table Column • Prediction New data unknown outcomes – predict if customer will buy product ID CARR DEPDELAY ORIGIN DEST … DISTANCE PREDICTED ARRDELAY 345 AA -1 mins SJU DFW … 2,195 0.5 mins 1235 UA 18.5 mins JAX IAD … 631 23 mins 342 NW 8 mins HNL SEA … 2,677 13.5 mins … … … …. … … … … Data • Database Table or View • Scoring data – for predictions
17. ### Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal

17 Split the Data into Train and Test/Validation sets • You need to be able to build (train) the model on one set of data, and the model needs to be capable of generalizing its qualities to new data coming in the future. We use a separate sample called Testing or Validation set to test the expected model behavior. Intuition: Data preparation Build Model Keep Test Data aside Score the Test Data Pass the data for Scoring without the Actual Response Compare the Model Predictions with the Actual known Responses Prediction Target Compute Goodness-of-fit Statistics
18. ### Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal

18 Depending on the Algorithm and Data • Data Transformation • Standardization/Normalization of values • Missing value Imputation For example, what can be derived from a single date? Data preparation 05/19/2020 Basic Information • 138 days since 1st Jan 2020 • Tuesday • Third day of the week • Second day of the workweek • Sunrise was at 6:32PM in Miami • Sun will set at 8:02PM in Miami • It's an overcast day in Miami • There were Flood Warnings in Miami Domain Knowledge • Has been a customer for 3.5 years • Machine has been operating for 564 days • Customer increased spending in the last 3 months • Revenue last month declined vs. Avg previous 3 months • Customer has declined usage 30% since last offer • 6 months since last Contact
19. ### Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal

19 OML Includes an Automatic Data Preparation • Most algorithms require some form of data transformation. During the model build process, Oracle Machine Learning can automatically perform the transformations required by the algorithm. You can choose to supplement the automatic transformations with additional transformations of your own, or you can choose to manage all the transformations yourself. • In calculating automatic transformations, Oracle Machine Learning uses heuristics that address the common requirements of a given algorithm. This process results in reasonable model quality in most cases. • Binning, normalization, and outlier treatment are transformations that are commonly needed by data mining algorithms. Data preparation
20. ### Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal

20 Normalization • Normalization is the most common technique for reducing the range of numerical data. Most normalization methods map the range of a single variable to another range (often 0,1). Outlier Treatment • A value is considered an outlier if it deviates significantly from most other values in the column. The presence of outliers can have a skewing effect on the data and can interfere with the effectiveness of transformations such as normalization or binning. • Outlier treatment methods such as trimming or clipping can be implemented to minimize the effect of outliers. • Outliers represent problematic data, for example, a bad reading due to the abnormal condition of an instrument. However, in some cases, especially in the business arena, outliers are perfectly valid. For example, in census data, the earnings for some of the richest individuals can vary significantly from the general population. Do not treat this information as an outlier, since it is an important part of the data. You need domain knowledge to determine outlier handling. OML automatic data preparation for Regression
21. ### Copyright © 2020, Oracle and/or its affiliates 21 The simplest

regression model is computed between a continuous target and a single attribute. Let's assume both the target and the predictor attribute are continuous. Let's look at Arrival Delay of a Flight in relation to the Departure Delay. Linear Regression Model Intuition 1 0 2 1 2 Departure Delay of Flight Arrival Delay of Flight 3 -1
22. ### Copyright © 2020, Oracle and/or its affiliates 22 The simplest

regression model is computed between a continuous target and a single attribute. Let's assume both the target and the predictor attribute are continuous. Let's look at Arrival Delay of a Flight in relation to the Departure Delay. Linear Regression Model Intuition Our intuition says that a Linear Correlation exists between these to variables…. But which line would be the correct one? 1 0 2 1 2 Departure Delay of Flight Arrival Delay of Flight 3 -1
23. ### 1 0 2 1 2 Departure Delay of Flight Arrival

Delay of Flight 3 -1 Copyright © 2020, Oracle and/or its affiliates 23 The simplest regression model is computed between a continuous target and a single attribute. Let's assume both the target and the predictor attribute are continuous. Let's look at Arrival Delay of a Flight in relation to the Departure Delay. Linear Regression Model Intuition Ordinary Least Squares to the rescue! We will sum the squared distances between each point to the line, known as the residuals (distance from the linear model to the actual data point), and select the line that minimizes that distance. Note: we need to use the square because distances are positive and negative, otherwise they could cancel each other in a plain sum -0.5 0.4 -0.6 0.7
24. ### 1 0 2 1 2 Departure Delay of Flight Arrival

Delay of Flight 3 -1 Copyright © 2020, Oracle and/or its affiliates 24 The simplest regression model is computed between a continuous target and a single attribute. Let's assume both the target and the predictor attribute are continuous. Let's look at Arrival Delay of a Flight in relation to the Departure Delay. Linear Regression Model Intuition Regression Model expression Y = β 0 + β 1 * X1 + β 2 * X2 + … + β n * Xn + ! Target Intercept Term Attribute Coefficients Attributes Error
25. ### 1 0 2 1 2 Departure Delay of Flight Arrival

Delay of Flight 3 -1 Copyright © 2020, Oracle and/or its affiliates 25 The simplest regression model is computed between a continuous target and a single attribute. Let's assume both the target and the predictor attribute are continuous. Let's look at Arrival Delay of a Flight in relation to the Departure Delay. Linear Regression Model Intuition In our simple case Y = β 0 + β 1 * X1 + ! Arrival Delay Intercept Term = 1 Attribute Coefficient = 1 Attribute 1 = Departure Delay of Flight Error When Departure is 0, Arrival is 1 When Departure is 1, Arrival is 2, leading to a 45-degree angle. The coefficient is related to the slope of the line
26. ### Copyright © 2020, Oracle and/or its affiliates 26 There are

many other Regression methods used currently in Machine Learning The Linear Regression is the simplest method, and relatively easy to interpret. It contains many statistical assumptions about the attributes, including the distribution of their variance, but it is one of the fastest to be computed Other methods include: • Support Vector Machines • Neural Networks • XGBoost* Other algorithms used for Regression
27. ### Copyright © 2020, Oracle and/or its affiliates 27 There are

a few methods to evaluate the "Goodness-of-fit" of a Regression model. In the case of a Linear Regression, the Coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variance in the dependent (target) variable (attribute) that is predictable from the independent (predictor) variables (attributes). When an intercept is included, then r2 is simply the square of the sample correlation coefficient (i.e., r) between the observed outcomes and the observed predictor values. For models with more than one predictor, the "adjusted R2" compares the explanatory power of regression models that contain different numbers of predictors. It's a modified version of R2 that has been adjusted for the number of predictors in the model. It is helpful when adding or removing predictors from the same model, since the adjusted R2 increases only if the a newly added attribute improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance. The adjusted R- squared can be negative, but it’s usually not. It is always lower than the R2. Evaluation of Regression Models
28. ### Copyright © 2020, Oracle and/or its affiliates 28 Generic methods

to evaluate the "Goodness-of-fit" of a Regression model. 1. Build the Model on the Training Data 2. Score the New Data that we set aside for Test 3. Compute the difference between the Predictions and the Actual response 1. Because these differences are positive and negative, we need to choose either the Absolute Value or the Squared Values. 1. The Mean Absolute Error (MAE) is potentially more intuitive, since it is in the same scale as the original Attribute 2. The Mean Squared Error (MSE) is very much a standard as well, but might suffer with outliers given the squared error terms. 3. A well-known and used variation on the MSE an analogy to standard deviation, where the square root of MSE yields the root-mean-square error (RMSE), which in most cases is the square root of the variance, known as the standard error. Evaluation of Regression Models n n RMSE MSE MAE
29. ### AutoML – new with OML4Py Auto Feature Selection – Reduce

# of features by identifying most predictive – Improve performance and accuracy Increase data scientist productivity – reduce overall compute time Auto Model Selection Much faster than exhaustive search Auto Feature Selection >50% reduction in features AutoTune Significant score improvement ML Model Auto Model Selection – Identify in-database algorithm that achieves highest model quality – Find best model faster than with exhaustive search Auto Tune Hyperparameters – Significantly improve model accuracy – Avoid manual or exhaustive search techniques Copyright © 2019 Oracle and/or its affiliates. Enables non-expert users to leverage Machine Learning Data Table
30. ### Copyright © 2020, Oracle and/or its affiliates 30 Demo on

OML4Py, Advanced Model Evaluation and Comparison, Predictions and Statistics