TMPA-2021: Data Stream Processing in Reconciliation Testing: Industrial Experience

1 25-27 NOVEMBER SOFTWARE TESTING, MACHINE LEARNING AND COMPLEX PROCESS
ANALYSIS Data Stream Processing in Reconciliation Testing: Industrial Experience Iosif Itkin, Nikolay Dorofeev, Stanislav Glushkov, Alexey Yermolayev, Elena Treshcheva Exactpro

2 2 Iosif Itkin CEO and co-founder Stanislav Glushkov DocOps
Engineer Elena Treshcheva Program Manager Alexey Yermolayev QA Project Manager Nikolay Dorofeev Senior DocOps Engineer The authors team:

3 3 Research paper overview: - Problem statement: importance of
real-time data reconciliation in testing - Related work: data reconciliation approaches and data stream processing tools - Business context: previous experience of data reconciliation at Exactpro - Industrial example: check2recon, a stream analytics module for reconciliation testing - Lessons learned / conclusion

4 4 What is data reconciliation?

5 5 Data reconciliation: diﬀerent protocols

6 6 Complexity of a typical exchange system

7 7 ➢ Accuracy ➢ Consistency ➢ Speed (real-time (or
stream) processing) ➢ Rule-based processing (dynamic queries against real-time data) Requirements for state-of-the-art reconciliation tools

8 • Data model: ◦ Fixed model (persistent data) ◦
Stream model (continuously changing (modified or appended) data) • Data format: ◦ Tuples (sets of key-value pairs) ◦ Objects (as in object-oriented languages or database) ◦ XML documents • Matching type: ◦ Exact matching (’basic field matching algorithm’) ◦ Recursive field matching (recursive structure) ◦ Approximate matching • Matching procedure: ◦ SQL-like queries/requests ◦ Programmatic queries/requests Data reconciliation approaches

9 9 • Apache Kafka • Apache Flink • Spark
Streaming • Esper • Apache Kafka • Flume • Kinesis • Esper Stream data integration Stream analytics Stream Processing

10 Business context (Implemented approaches) TVR (”Trade VeRification”): • Post-transactional
tool • Functional analysis of functional and non-functional test results • Supports credit matrix • Flexible configuration (reference data, connectivity, flows, etc.) • Analysis of matching activity on the market Shsha: • Passive post-transactional tool • SQL-based • Supports various industry-standard and proprietary protocols • Analyzes clients’ activity and forecasts system response • Parses and displays logs in a user-friendly way • Processes massive amounts of heterogeneous client connections data • Allows making summarized reports MD Analyzer / Book checker: • Parses dump files • Builds order books • Checks correctness of price and quantity • Verifies timestamps / checksum / pulse restrictions / verify price level • Supports different protocols (FIX and NAT with different versions)

11 Mini-Robots: • Active functional and non-functional testing tool •
Fast (thousands of messages, milliseconds precision • Supports multiple trading flows (including Market Data) • Real-time adaptation with smart algorithm • Supports various industry-standard and proprietary protocols • Multi-threaded Java code specifying different liquidity profiles • Concurrent emulation of multiple participants NFA Analyzer: • Passive post-transactional tool • Reconciles parameters from different API flows (OE and MD) • Generates NFA reports from API messages • Reconciles NFA reports generated from API flows with NFA report generated by the system • Supports various industry-standard and proprietary protocols • Based on xsd schemas received from National Futures Association (configurable and changeable) • Flexible configuration structure allowing usage of common programming structures Business context (Implemented approaches)

12 • th2 - framework based on Kubernetes for testing
of complex distributed transactional systems Check2recon config on a pod level: • name • maximum number of stored events • cache size • events sending interval • rules ◦ name ◦ timeout check2recon as a th2 module

13 • Rules are special Python classes that contain logic
of comparing events and committing actions with them Groups types: • Single - messages in this group to have unique hashes (i.e. keys) • Multiple - group can store several messages with the same hash Rules

14 Rules: the workﬂow

15 Grouping messages • Group() marks message with the group
id

16 Deﬁning keys for messages • Hash() generates the hash
key for the message to be processed by the JOIN-like query

17 Matching and checking messages • Check() compares a given
message with all messages from diﬀerent groups having the same hash key

18 check2recon data reconciliation parameters • Data model: ◦ Fixed
model ◦ Stream model • Data format: ◦ Highly customizable on input ◦ Uniﬁed format inside • Matching type: ◦ Deﬁned by user • Matching procedure: ◦ Programmatic

19 Lessons learnt • Python for data reconciliation: ◦ Advantages:
simplicity, ﬂexibility, popularity, relatively low entry threshold ◦ Disadvantages: single-threaded mode, only relatively low performance • The idea of check2recon contributes to the development of the software testing professional domain • Our contribution here is provided by outlining the lessons learnt from the comparison of this custom-made component against existing tools • Described experience will help to improve data reconciliation tool

20 Thank You! Follow TMPA on Facebook TMPA-2021 Conference

TMPA-2021: Data Stream Processing in Reconcilia...

TMPA-2021: Data Stream Processing in Reconciliation Testing: Industrial Experience

Exactpro
PRO

More Decks by Exactpro

Other Decks in Technology

Featured

Transcript

1 25-27 NOVEMBER SOFTWARE TESTING, MACHINE LEARNING AND COMPLEX PROCESS

2 2 Iosif Itkin CEO and co-founder Stanislav Glushkov DocOps

3 3 Research paper overview: - Problem statement: importance of

4 4 What is data reconciliation?

5 5 Data reconciliation: diﬀerent protocols

6 6 Complexity of a typical exchange system

7 7 ➢ Accuracy ➢ Consistency ➢ Speed (real-time (or

8 • Data model: ◦ Fixed model (persistent data) ◦

9 9 • Apache Kafka • Apache Flink • Spark

10 Business context (Implemented approaches) TVR (”Trade VeRiﬁcation”): • Post-transactional

11 Mini-Robots: • Active functional and non-functional testing tool •

12 • th2 - framework based on Kubernetes for testing

13 • Rules are special Python classes that contain logic

14 Rules: the workﬂow

15 Grouping messages • Group() marks message with the group

16 Deﬁning keys for messages • Hash() generates the hash

17 Matching and checking messages • Check() compares a given

18 check2recon data reconciliation parameters • Data model: ◦ Fixed

19 Lessons learnt • Python for data reconciliation: ◦ Advantages:

20 Thank You! Follow TMPA on Facebook TMPA-2021 Conference