Basics of Machine Learning & Its Application In Predictive Maintenance
This presentation introduces the concept of Machine Learning and then discusses how Machine Learning is being used in the Predictive Maintenance domain
Programming vs Machine Learning • Types of Machine Learning • Types of Supervised Learning • Application of ML in Predictive Maintenance (PdM) • Types of Maintenance • Goals & Use Cases for PdM • Data Science For PdM
domain expert • Survey existing apartments in Bangalore • Identify factors contributing to the price of an apartment • Area • Size • Number of Bedrooms, Bathrooms • Name of the builder • etc. • Write a program which outputs the price based on the attributes identified Reference : https://medium.com/@karpathy/software-2-0-a64152b37c35
requires long list of rules which is difficult to maintain. ML can simplify the code. • ML “automatically” discovers change in data. Classical Programming needs manual update in the rules. • ML performs better for complex problems (Image, Text, Audio etc.) • Humans can gain insights from ML models
Cancer • Medical textbooks decides based on number of “yes” to the questions: 1. Has the cancer affected more than one lymph node? 2. Are the cancerous lymph nodes both above & below the bottom of the rib cage? 3. Is the cancer found in organs outside lymphatic system (in patient's bone marrow)? • A 2018 Research paper (University of Modena & Reggio Emilia) • Analyzed 15 variables, identifying 5 features • Due to limited cognitive ability, humans need a handful of most obvious signifiers/features • ML/AI decides based on hundreds if not thousands distinct features • May include traditional as well as less intuitive features
when it improves at a task based on experience at that task, but without explicit programming. Reference : https://cloud.google.com/products/ai/ml-comic-1/
• Detecting cancer based on X-Ray Image (Computer Vision) • Forecasting company’s revenue based on various factors (Time Series Forecasting) • Summarizing long document into smaller, meaningful text (Language Processing) • Writing HTML, SQL, Unix code based on human language (Language Processing - GTP-3)
human supervision • Supervised Learning • Unsupervised Learning • Reinforcement Learning • Whether learning is incremental • Online Learning • Batch Learning • Instance based vs Model based learning
and desired outputs (labels) • The algorithm can create an output for an unseen input • User (Teacher) is supervising the algorithm to learn Input Output
to algorithm • Output data is unknown • Often used in understanding data better before solving a supervised learning problem • Usually harder to understand and evaluate • Applications • Segmenting readers based on their reading habits • Identifying topics of news articles • Anomaly Detection • Dimensionality Reduction • Clustering Input
environment • Selects & performs actions • Gets rewarded or punished for actions • Learning system must learn by itself the best strategy (policy) to win most reward over time. • Examples • Robotics • AlphaGo Program • Energy Efficiency Reference: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
continuous number • Classification: Goal is to predict a class label Label: Continuous Number Label: Distinct Values Reference: https://sebastianraschka.com/blog/2020/intro-to-dl-ch01.html
equipment are replaced only on failure • Doesn’t waste part’s life, but results in downtime, unscheduled maintenance • Preventive Maintenance • Replaces a part after pre-determined useful lifespan, before it fails • Avoids unscheduled maintenance • Under utilization of parts • Predictive Maintenance • Replaces only the parts close to their failure (Just in time replacement) • Extends part’s lifespan • Reduce unscheduled maintenance Reference: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook https://arxiv.org/pdf/1912.07383.pdf
is going to fail in near future • Predict days to failure • Helps in scheduling a maintenance • Predict most probable root cause of a failure • Helps in identifying part(s) to repair/replace
Problem into Data Science problem • Understand Data • Prepare Data • Building Model • Evaluate Model • Deploy Model • Monitor/Maintain Model Reference:https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining
Predict probability for an equipment to fail within a future time period • Regression • Predict amount of time that an equipment is operational before next failure • Multi-class classification • Predict probability for an equipment to fail within next ..3X, 2X, X unit of time • Predict probability for an equipment to fail within a future time period for a particular root cause
X unit of time • Labels (Discrete Number) • Failure within X time unit (1) • Healthy (0) Reference: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook
equipment • Label: Time for which an asset is operational before next failure (RUL) • Continuous Number • Disadvantage • Equipment without any failures cannot be used for modeling
within next …, 3X, 2X, X units of time • Labels (Discrete Number) • Healthy (0) • Failure within 3X time unit (3Z) • Failure within 2X time unit (2Z) • Failure within X time unit (Z)
• Sufficient Data • Duration (Year, Month, Day..) • Larger number of failures • Different types of failures • Quality of data • Garbage In, Garbage Out Reference: Google : Hidden Technical Debt in Machine Learning Systems
Health • Example: Vibration, Voltage, Temperature, Humidity, Pressure etc. • Collected using IoT sensors • Temporal features reflecting aging pattern & anomalies • Represents normal & faulty behaviors over time • Maintenance history • Example: Dates of Repair activities, Components replaced etc. • Captures degradation patterns • Failure history • Weather • Usage (Load) of the equipment • Static Data • Equipment Metadata • Manufacturer, Make, Model • Manufacture Date, Installation Date, Age • Geographical Location
• Data is relevant • Data includes expected patterns • In case of no obvious patterns, add more features Reference: https://cloud.google.com/blog/products/data-analytics/a-process-for-implementing-industrial-predictive-maintenance-part-ii
format • Each row represents state of an equipment at any particular point of time accompanied with a label • Up-Sampling/Down-Sampling • Data Collection frequency may not match with prediction frequency • Data may be collected hourly, but, failure may be predicted at the day level
• Forward Filling • Interpolation • Domain Specific • Fill missing value of pressure of an equipment on 1 PM, Tuesday • with last Tuesday 1 PM’s value • with Tuesday 1 PM’s value averaged over last 1 month • etc. • Strategy should be validated using cross-validation • Removal of duplicates
time units (e.g. days) is noisy • Needs to be smoothened by aggregating over time windows • Examples • Temperature: Fluctuating. Average value over day may rise with degradation • Vibration: May increase drastically before failure. Max over day could be a good feature https://cloud.google.com/blog/products/data-analytics/a-process-for-implementing-industrial-predictive-maintenance-part-ii https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook
influences “how far in past the model has to look back” to make predictions • Lag Features • “Looking back” period is called “Lag” • Rolling Aggregate (Examples) • Rolling Average of temperature over last 7, 15, 21 days • Rolling Max of vibration over last 7, 15, 21 days • Rolling count of alarms over last 1, 3, 5, 7 days Feature Engineering (Temporal Data) Rolling Aggregate
• Month • Quarter • Year • etc. • Maintenance Data • Days since last failure • Days since last failure because of specific root cause • Days since specific part replaced • Days since last maintenance • Static Data • Age of the equipment
following time • Training, Validation, Test data must be split in Time dependent manner. • Validation data must be in future compared to training data Reference: https://eng.uber.com/forecasting-introduction/
for? • Determining Factors • Imbalanced Data • High Cost of False Alarm • Performance Metrics • Accuracy: Not Suitable • Precision: Lower value corresponds to higher rate of false alarms • Recall: Higher value corresponds to successful identification of true failures. • F1 Score: Harmonic average of precision and recall • RoC (Receiver Operating Characteristics) Curve
that it starts making prediction on new, unseen data • Need • Data must be pre-processed & engineered exactly the same way as the model training • Suggested Approach : Batch Scoring • Model’s decision is not needed immediately • Example : Once in a day predict equipment those are going to fail in next 7 days
predictions vs ground truths • Did the failures really happened as predicted by model? • Was the equipment healthy when predicted? • Degradation of model’s performance may indicate need for retraining Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-mlconcepts.html
learning • Introduction to Machine Learning and Deep Learning by Sebastian Raschka • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow • Predictive Maintenance • Azure AI guide for predictive maintenance solutions • A process for implementing industrial predictive maintenance • A Survey of Predictive Maintenance: Systems, Purposes and Approaches