Slide 1

Slide 1 text

Machine Learning & Its Application In Predictive Maintenance Arnab Biswas [email protected] arnabbiswas1

Slide 2

Slide 2 text

Table of Content • Basics of Machine Learning • Classical Programming vs Machine Learning • Types of Machine Learning • Types of Supervised Learning • Application of ML in Predictive Maintenance (PdM) • Types of Maintenance • Goals & Use Cases for PdM • Data Science For PdM

Slide 3

Slide 3 text

What is Machine Learning? Task : Predict the price of an apartment in Bangalore

Slide 4

Slide 4 text

Classical Programming / Software 1.0 • Take help of a domain expert • Survey existing apartments in Bangalore • Identify factors contributing to the price of an apartment • Area • Size • Number of Bedrooms, Bathrooms • Name of the builder • etc. • Write a program which outputs the price based on the attributes identified Reference : https://medium.com/@karpathy/software-2-0-a64152b37c35

Slide 5

Slide 5 text

Classical Programming / Software 1.0 Software 1.0 Data Rule Answer

Slide 6

Slide 6 text

Machine Learning/Software 2.0 • First Step: Collect data (as much as possible) Reference : https://www.kaggle.com/amitabhajoy/bengaluru-house-price-data

Slide 7

Slide 7 text

Machine Learning/Software 2.0 Learning Algorithm Data Rule Answer *No explicit Programming!

Slide 8

Slide 8 text

Software 1.0 vs 2.0 Software 2.0 Data Rule Answer Software 1.0 Data Rule Answer

Slide 9

Slide 9 text

Features Labels Observation

Slide 10

Slide 10 text

Training vs Prediction Learning Algorithm Model Label Feature Training Prediction

Slide 11

Slide 11 text

ML Works Better When… • Problems for which classical programming requires long list of rules which is difficult to maintain. ML can simplify the code. • ML “automatically” discovers change in data. Classical Programming needs manual update in the rules. • ML performs better for complex problems (Image, Text, Audio etc.) • Humans can gain insights from ML models

Slide 12

Slide 12 text

Humans can gain insights from ML models • Stages of Cancer • Medical textbooks decides based on number of “yes” to the questions: 1. Has the cancer affected more than one lymph node? 2. Are the cancerous lymph nodes both above & below the bottom of the rib cage? 3. Is the cancer found in organs outside lymphatic system (in patient's bone marrow)? • A 2018 Research paper (University of Modena & Reggio Emilia) • Analyzed 15 variables, identifying 5 features • Due to limited cognitive ability, humans need a handful of most obvious signifiers/features • ML/AI decides based on hundreds if not thousands distinct features • May include traditional as well as less intuitive features

Slide 13

Slide 13 text

Machine Learning : Formal Definition • A Machine is Learning when it improves at a task based on experience at that task, but without explicit programming. Reference : https://cloud.google.com/products/ai/ml-comic-1/

Slide 14

Slide 14 text

AI vs ML • AI: Quest for developing non-biological systems that exhibit human-like forms of intelligence. Reference: https://sebastianraschka.com/blog/2020/intro-to-dl-ch01.html

Slide 15

Slide 15 text

Examples of Machine Learning • Recommending a video/song (Recommender System) • Detecting cancer based on X-Ray Image (Computer Vision) • Forecasting company’s revenue based on various factors (Time Series Forecasting) • Summarizing long document into smaller, meaningful text (Language Processing) • Writing HTML, SQL, Unix code based on human language (Language Processing - GTP-3)

Slide 16

Slide 16 text

Types of ML Systems • Whether or not trained with human supervision • Supervised Learning • Unsupervised Learning • Reinforcement Learning • Whether learning is incremental • Online Learning • Batch Learning • Instance based vs Model based learning

Slide 17

Slide 17 text

Supervised Learning • User provides the algorithm with inputs (features) and desired outputs (labels) • The algorithm can create an output for an unseen input • User (Teacher) is supervising the algorithm to learn Input Output

Slide 18

Slide 18 text

Unsupervised Learning • Only input data is known & passed to algorithm • Output data is unknown • Often used in understanding data better before solving a supervised learning problem • Usually harder to understand and evaluate • Applications • Segmenting readers based on their reading habits • Identifying topics of news articles • Anomaly Detection • Dimensionality Reduction • Clustering Input

Slide 19

Slide 19 text

Unsupervised Learning : Clustering • Each dot on plot represents a research article on COVID Reference: https://maksimekin.github.io/COVID19-Literature-Clustering/plots/t-sne_covid-19_interactive.html

Slide 20

Slide 20 text

Reinforcement Learning • Steps • Learning system (agent) observes an environment • Selects & performs actions • Gets rewarded or punished for actions • Learning system must learn by itself the best strategy (policy) to win most reward over time. • Examples • Robotics • AlphaGo Program • Energy Efficiency Reference: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

Slide 21

Slide 21 text

Supervised Machine Learning • Regression: Goal is to predict a continuous number • Classification: Goal is to predict a class label Label: Continuous Number Label: Distinct Values Reference: https://sebastianraschka.com/blog/2020/intro-to-dl-ch01.html

Slide 22

Slide 22 text

Predictive Maintenance

Slide 23

Slide 23 text

Types of Maintenance • Reactive Maintenance • Parts of an equipment are replaced only on failure • Doesn’t waste part’s life, but results in downtime, unscheduled maintenance • Preventive Maintenance • Replaces a part after pre-determined useful lifespan, before it fails • Avoids unscheduled maintenance • Under utilization of parts • Predictive Maintenance • Replaces only the parts close to their failure (Just in time replacement) • Extends part’s lifespan • Reduce unscheduled maintenance Reference: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook https://arxiv.org/pdf/1912.07383.pdf

Slide 24

Slide 24 text

Predictive Maintenance (PdM) : Goals • Predict if an equipment is going to fail in near future • Predict days to failure • Helps in scheduling a maintenance • Predict most probable root cause of a failure • Helps in identifying part(s) to repair/replace

Slide 25

Slide 25 text

Sample Use Cases • Failure of engine parts in an aircraft • HVAC equipment failure • Elevators door failure • Wind turbine failure • Failure of wheels of train

Slide 26

Slide 26 text

Data Science For Predictive Maintenance • Steps • Convert Business Problem into Data Science problem • Understand Data • Prepare Data • Building Model • Evaluate Model • Deploy Model • Monitor/Maintain Model Reference:https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining

Slide 27

Slide 27 text

Business problem into Data Science problem • Binary Classification • Predict probability for an equipment to fail within a future time period • Regression • Predict amount of time that an equipment is operational before next failure • Multi-class classification • Predict probability for an equipment to fail within next ..3X, 2X, X unit of time • Predict probability for an equipment to fail within a future time period for a particular root cause

Slide 28

Slide 28 text

Binary Classification • Goal: Predict probability of failure within next X unit of time • Labels (Discrete Number) • Failure within X time unit (1) • Healthy (0) Reference: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook

Slide 29

Slide 29 text

Regression • Goal: Predict remaining useful life (RUL) of the equipment • Label: Time for which an asset is operational before next failure (RUL) • Continuous Number • Disadvantage • Equipment without any failures cannot be used for modeling

Slide 30

Slide 30 text

Multi-class Classification (1) • Goal: Predict the probability of failure within next …, 3X, 2X, X units of time • Labels (Discrete Number) • Healthy (0) • Failure within 3X time unit (3Z) • Failure within 2X time unit (2Z) • Failure within X time unit (Z)

Slide 31

Slide 31 text

Multi-class Classification (2) • Goal: Predict probability of failure next X units of time due to root cause Pi ? • Labels • Failure due to different root causes (P1, P2, P3, .. ) • Healthy (0)

Slide 32

Slide 32 text

Time Series Classification • If business permits, Classification is preferred over Regression

Slide 33

Slide 33 text

Data Requirement • Relevant Data • Discuss with domain expert • Sufficient Data • Duration (Year, Month, Day..) • Larger number of failures • Different types of failures • Quality of data • Garbage In, Garbage Out Reference: Google : Hidden Technical Debt in Machine Learning Systems

Slide 34

Slide 34 text

Data Collection • Data Source • Temporal Data • Equipment’s Health • Example: Vibration, Voltage, Temperature, Humidity, Pressure etc. • Collected using IoT sensors • Temporal features reflecting aging pattern & anomalies • Represents normal & faulty behaviors over time • Maintenance history • Example: Dates of Repair activities, Components replaced etc. • Captures degradation patterns • Failure history • Weather • Usage (Load) of the equipment • Static Data • Equipment Metadata • Manufacturer, Make, Model • Manufacture Date, Installation Date, Age • Geographical Location

Slide 35

Slide 35 text

Data Exploration & Validation • Goal : Visualize & Validate • Data is relevant • Data includes expected patterns • In case of no obvious patterns, add more features Reference: https://cloud.google.com/blog/products/data-analytics/a-process-for-implementing-industrial-predictive-maintenance-part-ii

Slide 36

Slide 36 text

Data Pre-Processing • Structure data from various sources into tabular format • Each row represents state of an equipment at any particular point of time accompanied with a label • Up-Sampling/Down-Sampling • Data Collection frequency may not match with prediction frequency • Data may be collected hourly, but, failure may be predicted at the day level

Slide 37

Slide 37 text

Data Pre-Processing • Missing Value Handling • Temporal Data (Examples) • Forward Filling • Interpolation • Domain Specific • Fill missing value of pressure of an equipment on 1 PM, Tuesday • with last Tuesday 1 PM’s value • with Tuesday 1 PM’s value averaged over last 1 month • etc. • Strategy should be validated using cross-validation • Removal of duplicates

Slide 38

Slide 38 text

Feature Engineering • Goal: Extracts valuable information from raw data which the algorithm can’t see

Slide 39

Slide 39 text

Feature Engineering (Temporal Data) • Aggregation • Data over individual time units (e.g. days) is noisy • Needs to be smoothened by aggregating over time windows • Examples • Temperature: Fluctuating. Average value over day may rise with degradation • Vibration: May increase drastically before failure. Max over day could be a good feature https://cloud.google.com/blog/products/data-analytics/a-process-for-implementing-industrial-predictive-maintenance-part-ii https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook

Slide 40

Slide 40 text

• “How far in future the model has to predict” influences “how far in past the model has to look back” to make predictions • Lag Features • “Looking back” period is called “Lag” • Rolling Aggregate (Examples) • Rolling Average of temperature over last 7, 15, 21 days • Rolling Max of vibration over last 7, 15, 21 days • Rolling count of alarms over last 1, 3, 5, 7 days Feature Engineering (Temporal Data) Rolling Aggregate

Slide 41

Slide 41 text

Feature Engineering (Temporal Data) • Functions For Aggregation • Count • Average • Maximum • Minimum • Median • Standard Deviation • Variance • Count • Sum • Cumulative Sum • Derivate • 2nd Derivate • Count of outliers

Slide 42

Slide 42 text

Feature Engineering • Date • Day • Week • Weekday/Weekend • Month • Quarter • Year • etc. • Maintenance Data • Days since last failure • Days since last failure because of specific root cause • Days since specific part replaced • Days since last maintenance • Static Data • Age of the equipment

Slide 43

Slide 43 text

Model Architecture & Algorithms Binary Classification Multi-class Classification Regression RNN, LSTM RNN, LSTM RNN, LSTM DNN DNN DNN GBM Random Forest SVM (etc.) GBM Random Forest SVM Hidden Markov Chain (etc.) GBM RF Regression (etc.) Reference: https://cloud.google.com/blog/products/data-analytics/a-process-for-implementing-industrial-predictive-maintenance-part-ii

Slide 44

Slide 44 text

Cross Validation • Goal • Validates a model during & at the end of training • Reduces Overfitting • Generalizes well with unknown data https://scikit-learn.org/stable/modules/cross_validation.html

Slide 45

Slide 45 text

Time Series Cross Validation • In PdM, data is ordered following time • Training, Validation, Test data must be split in Time dependent manner. • Validation data must be in future compared to training data Reference: https://eng.uber.com/forecasting-introduction/

Slide 46

Slide 46 text

Split between Training & Test Data • Split by Time • Separate Train & Test data by the window size (“Look ahead time in future”) • Split by Equipment • Better performance with new equipment

Slide 47

Slide 47 text

Model Evaluation (Binary Classification) • Goal: What metric to optimize for? • Determining Factors • Imbalanced Data • High Cost of False Alarm • Performance Metrics • Accuracy: Not Suitable • Precision: Lower value corresponds to higher rate of false alarms • Recall: Higher value corresponds to successful identification of true failures. • F1 Score: Harmonic average of precision and recall • RoC (Receiver Operating Characteristics) Curve

Slide 48

Slide 48 text

Model Serving/Prediction • Goal: Deploy the model in production, so that it starts making prediction on new, unseen data • Need • Data must be pre-processed & engineered exactly the same way as the model training • Suggested Approach : Batch Scoring • Model’s decision is not needed immediately • Example : Once in a day predict equipment those are going to fail in next 7 days

Slide 49

Slide 49 text

Model Monitoring/Maintenance • Evaluate model’s performance in production • Compare predictions vs ground truths • Did the failures really happened as predicted by model? • Was the equipment healthy when predicted? • Degradation of model’s performance may indicate need for retraining Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-mlconcepts.html

Slide 50

Slide 50 text

References • Machine Learning • A visual introduction to machine learning • Introduction to Machine Learning and Deep Learning by Sebastian Raschka • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow • Predictive Maintenance • Azure AI guide for predictive maintenance solutions • A process for implementing industrial predictive maintenance • A Survey of Predictive Maintenance: Systems, Purposes and Approaches

Slide 51

Slide 51 text

Question? arnabbiswas1 [email protected]