Slide 1

Slide 1 text

Schlumberger-Private Multilabel Classification for Inflow Profile Monitoring Dmitry I. Ignatov, Pavel Spesivtsev, Dmitry Kurgansky, Ivan Vrabie, Svyatoslav Elizarov, Vladimir Zyuzin 22.03.2019

Slide 2

Slide 2 text

Schlumberger-Private The inflow zones (sources) problem Active inflow source Inactive inflow source Goal: • Determine the active and inactive inflow sources Effects: • Efficiency decrease Applications: • Decision making: reperforating • Understanding performance: A small amount of active inflow sources or small productivity of the sources. BHP WHP Q

Slide 3

Slide 3 text

Schlumberger-Private The machine learning approach Target variable: The binary vector of size N of active and inactive inflow sources. Features: Time series of surface flowrates, BHP and WHP. Models: Random Forest, XGBoost, SVM, kNN, CNN and LSTM Auxiliary methods: • Feature engineering • Dimensionality reduction • Ensemble of algorithms • Cascade algorithms

Slide 4

Slide 4 text

Schlumberger-Private Data Numerical simulator Input parameters: • Wellbore geometry • Distribution of volume fractions of phases • Choke size • … Output variables • Surface flowrates • Bottomhole pressure • Wellhead pressure • Vector of active and non-active sources 5000 numerical simulations Train test split: 4:1 • Wellbore geometry • Distribution of volume fractions of phases • Choke size • Qo, Qg, Qw • BHP • WHP • Target vector Numerical simulator

Slide 5

Slide 5 text

Schlumberger-Private Feature generation Features obtained using TSFresh: • Average • Standard deviation • Median • Dispersion • Min/Max value • Trend • Number of min/max values • … • ~ 1200 features Features used for solving the multi-label classification problem

Slide 6

Slide 6 text

Schlumberger-Private Experiment 1 The approach of independent classifiers for each of the inflow sources. Normalization: standard, min-max Dimension reduction: PCA, ICA Methods: RF, SVM, kNN and XGBoost Best results: XGBoost + standard normalization + PCA 0/1 loss: 0.36

Slide 7

Slide 7 text

Schlumberger-Private Experiment 2 Ensemble approach: top 10 algorithms + majority voting. 0/1 loss: 0.31 Algorithms: 1. XGBoost + PCA + standard norm. 2. RF + PCA + standard norm. 3. XGBoost + PCA + min-max norm. 4. RF + PCA + min-max norm. 5. XGBoost + PCA 6. RF + ICA 7. SVM + ICA + standard norm. 8. RF + PCA 9. SVM + PCA + min-max norm. 10. kNN + ICA + standard norm. Accuracy matrix

Slide 8

Slide 8 text

Schlumberger-Private Experiment 3 Cascade classifier approach: 1. Predict the number of working sources (100% accuracy). 2. Obtain probabilities of class 1 for each source separately. 3. Sort probabilities in descending order. 4. Select the sources with the highest probabilities. 0/1 loss: 0.44

Slide 9

Slide 9 text

Schlumberger-Private Experiment 4 Enlarged feature space approach • 300 ICA components over the time series • 300 PCA components over the new features • Number of active sources Method: XGBoost 0/1 loss: 0.26

Slide 10

Slide 10 text

Schlumberger-Private Summary and Conclusions • The problem of inflow profile monitoring is hard to handle. • The XGBoost method over initial data + extracted features achieved a 0/1 loss of 0.26 • The sources that are closer to the surface are easier to predict. • The results are better than a random guess and they show a potential possibility in improvement. XGBoost, PCA Ensemble of 10 algorithms Cascade classifier XGBoost + PCA + ICA 0/1 loss 0.36 0.31 0.44 0.26