Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TMPA-2021: Data Stream Processing in Reconciliation Testing: Industrial Experience

Exactpro
November 26, 2021

TMPA-2021: Data Stream Processing in Reconciliation Testing: Industrial Experience

Iosif Itkin, Nikolay Dorofeev, Stanislav Glushkov, Alexey Yermolayev and Elena Treshcheva, Exactpro

Data Stream Processing in Reconciliation Testing: Industrial Experience

TMPA is an annual International Conference on Software Testing, Machine Learning and Complex Process Analysis. The conference will focus on the application of modern methods of data science to the analysis of software quality.

To learn more about Exactpro, visit our website https://exactpro.com/

Follow us on
LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro

Exactpro

November 26, 2021
Tweet

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. 1 25-27 NOVEMBER SOFTWARE TESTING, MACHINE LEARNING AND COMPLEX PROCESS

    ANALYSIS Data Stream Processing in Reconciliation Testing: Industrial Experience Iosif Itkin, Nikolay Dorofeev, Stanislav Glushkov, Alexey Yermolayev, Elena Treshcheva Exactpro
  2. 2 2 Iosif Itkin CEO and co-founder Stanislav Glushkov DocOps

    Engineer Elena Treshcheva Program Manager Alexey Yermolayev QA Project Manager Nikolay Dorofeev Senior DocOps Engineer The authors team:
  3. 3 3 Research paper overview: - Problem statement: importance of

    real-time data reconciliation in testing - Related work: data reconciliation approaches and data stream processing tools - Business context: previous experience of data reconciliation at Exactpro - Industrial example: check2recon, a stream analytics module for reconciliation testing - Lessons learned / conclusion
  4. 7 7 ➢ Accuracy ➢ Consistency ➢ Speed (real-time (or

    stream) processing) ➢ Rule-based processing (dynamic queries against real-time data) Requirements for state-of-the-art reconciliation tools
  5. 8 • Data model: ◦ Fixed model (persistent data) ◦

    Stream model (continuously changing (modified or appended) data) • Data format: ◦ Tuples (sets of key-value pairs) ◦ Objects (as in object-oriented languages or database) ◦ XML documents • Matching type: ◦ Exact matching (’basic field matching algorithm’) ◦ Recursive field matching (recursive structure) ◦ Approximate matching • Matching procedure: ◦ SQL-like queries/requests ◦ Programmatic queries/requests Data reconciliation approaches
  6. 9 9 • Apache Kafka • Apache Flink • Spark

    Streaming • Esper • Apache Kafka • Flume • Kinesis • Esper Stream data integration Stream analytics Stream Processing
  7. 10 Business context (Implemented approaches) TVR (”Trade VeRification”): • Post-transactional

    tool • Functional analysis of functional and non-functional test results • Supports credit matrix • Flexible configuration (reference data, connectivity, flows, etc.) • Analysis of matching activity on the market Shsha: • Passive post-transactional tool • SQL-based • Supports various industry-standard and proprietary protocols • Analyzes clients’ activity and forecasts system response • Parses and displays logs in a user-friendly way • Processes massive amounts of heterogeneous client connections data • Allows making summarized reports MD Analyzer / Book checker: • Parses dump files • Builds order books • Checks correctness of price and quantity • Verifies timestamps / checksum / pulse restrictions / verify price level • Supports different protocols (FIX and NAT with different versions)
  8. 11 Mini-Robots: • Active functional and non-functional testing tool •

    Fast (thousands of messages, milliseconds precision • Supports multiple trading flows (including Market Data) • Real-time adaptation with smart algorithm • Supports various industry-standard and proprietary protocols • Multi-threaded Java code specifying different liquidity profiles • Concurrent emulation of multiple participants NFA Analyzer: • Passive post-transactional tool • Reconciles parameters from different API flows (OE and MD) • Generates NFA reports from API messages • Reconciles NFA reports generated from API flows with NFA report generated by the system • Supports various industry-standard and proprietary protocols • Based on xsd schemas received from National Futures Association (configurable and changeable) • Flexible configuration structure allowing usage of common programming structures Business context (Implemented approaches)
  9. 12 • th2 - framework based on Kubernetes for testing

    of complex distributed transactional systems Check2recon config on a pod level: • name • maximum number of stored events • cache size • events sending interval • rules ◦ name ◦ timeout check2recon as a th2 module
  10. 13 • Rules are special Python classes that contain logic

    of comparing events and committing actions with them Groups types: • Single - messages in this group to have unique hashes (i.e. keys) • Multiple - group can store several messages with the same hash Rules
  11. 16 Defining keys for messages • Hash() generates the hash

    key for the message to be processed by the JOIN-like query
  12. 17 Matching and checking messages • Check() compares a given

    message with all messages from different groups having the same hash key
  13. 18 check2recon data reconciliation parameters • Data model: ◦ Fixed

    model ◦ Stream model • Data format: ◦ Highly customizable on input ◦ Unified format inside • Matching type: ◦ Defined by user • Matching procedure: ◦ Programmatic
  14. 19 Lessons learnt • Python for data reconciliation: ◦ Advantages:

    simplicity, flexibility, popularity, relatively low entry threshold ◦ Disadvantages: single-threaded mode, only relatively low performance • The idea of check2recon contributes to the development of the software testing professional domain • Our contribution here is provided by outlining the lessons learnt from the comparison of this custom-made component against existing tools • Described experience will help to improve data reconciliation tool