Save 37% off PRO during our Black Friday Sale! »

TMPA-2021: Early Detection of Tasks With Uncommonly Long Run Duration in Post-Trade Systems

TMPA-2021: Early Detection of Tasks With Uncommonly Long Run Duration in Post-Trade Systems

Maxim Nikiforov, Danila Gorkavchenko, Murad Mamedov, Andrey Novikov and Nikita Pushchin, Exactpro

Early Detection of Tasks With Uncommonly Long Run Duration in Post-Trade Systems

TMPA is an annual International Conference on Software Testing, Machine Learning and Complex Process Analysis. The conference will focus on the application of modern methods of data science to the analysis of software quality.

To learn more about Exactpro, visit our website https://exactpro.com/

Follow us on
LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro

5206c19df417b8876825b5561344c1a0?s=128

Exactpro
PRO

November 26, 2021
Tweet

Transcript

  1. 1 25-27 NOVEMBER SOFTWARE TESTING, MACHINE LEARNING AND COMPLEX PROCESS

    ANALYSIS Early Detection of Tasks With Uncommonly Long Run Duration in Post-Trade Systems Maxim Nikiforov, Danila Gorkavchenko, Andrey Novikov, Murad Mamedov, Nikita Pushchin
  2. 2 Build Software to Test Software exactpro.com Target Testing System

    • 20M trades per day • 200+ running components • Components are deployed over 30 servers • 100+ scheduled activities Application Servers Components 1 30 19 1 Inactive during BAU run Active Components
  3. 3 Build Software to Test Software exactpro.com Goal • To

    develop an automated approach to predicting deviations before they become obvious • It should be possible to adapt it to other systems • Logs with statistical parameters will be used for the analysis • Ideally, it should indicate the root cause of the problem to the operational user (QA)
  4. 4 Build Software to Test Software exactpro.com Scheduled Events Attributes

    which describe scheduled events: • Unique identifier; • Type of activity; • Start time; • End time; • Completion status
  5. 5 Build Software to Test Software exactpro.com Telemetry Logs 08:01:00

    : Comp1: Group1: Param1=10, Param2=99, Param3=4 08:01:01 : Comp1: Group1: Param1=11, Param2=98, Param3=4 Raw Log Format: • Stored in textual format • Not structured • Hundred of gigabytes of logs Comp1 - Group1 - Param1 Comp1 - Group1 - Param2 Comp1 - Group1 - Param3 8:01:00 10 99 4 8:01:01 11 98 4 CSV Format: • The data is structured now • Size of logs is reduced by 93%
  6. 8 Build Software to Test Software exactpro.com Dataset Preparation Activity

    1 Telemetry params tim eline start end Activity 2 start end Activity N start end 1 2 3 … N Run id Activity Duration Aggregated parameters from the telemetry logs Activity parameters: Start time, End time, ID, Status Data collected from CSV files Dataset with numbers only - Handle object columns - Handle unique ids Activity-based Data Set Drop constant columns Drop Correlated Columns All statistical parameters are aggregated using a set of functions (min, mean, max) 1 2 3 4 5 2.5M x 7.5k 2.5M x 8.3k 11k x 25k 11k x 4.5k 11k x 2.5k Dataset shape: Reducing data size pipeline:
  7. 9 Build Software to Test Software exactpro.com Training a Model

    for 1 Activity Type Data = One Activity Type Model = Decision Tree Metrics = Root Mean Squared Error Results: RMSE = 202 sec STD = 45 sec
  8. 10 Build Software to Test Software exactpro.com Training a Single

    Model for All Activities Data = All activities Model = Decision Tree Metrics = Root Mean Squared Error Results: RMSE = 767 sec
  9. 11 Build Software to Test Software exactpro.com Model Performance Improvements

    • Logarithmic target value • Exclude rare activities • Stop using the absolute value of RMSE in seconds and calculate it in relation to the target value
  10. 12 Build Software to Test Software exactpro.com Comparison of Different

    Models • Accuracy of the model, if we train it using RandomForest, is always better than when we train model using Decision Tree;
  11. 13 Build Software to Test Software exactpro.com Checking if it

    is Possible to Predict the Duration of an Activity 100% 75% 50% 25 % Timeline, sec Avg. duration of activity
  12. 14 Build Software to Test Software exactpro.com Prediction Based on

    Joint Dataset with Time Marker 100% 75% 50% 25 % Timeline, sec Avg. duration of activity • Dataset becomes 3 times bigger • Performance improvement is due to time reference field Results: RMSE = 25.6 % STD = 0.58 %
  13. 15 Build Software to Test Software exactpro.com Results • The

    approach to prepare the dataset of reasonable size was found • The approach for data augmentation was developed • The experiments shown that Random Forest Regressor model predicts activity with acceptable performance (RMSE = 25.6%, STD = 0.58%), but there is a room for improvement Future work • Find out how much data we need to start acceptable prediction in production runs • Prove that the similar performance can be reached for a time-based data set. • Find other ways for data aggregation to get better model performance • Predict failures of the activities
  14. 16 Thank you!