Reconciliation Testing Aspects of Trading Systems Software Failures

Reconciliation Testing Aspects of Trading Systems Software Failures

Предварительный сборник трудов 8-ого весеннего/летнего коллоквиума молодых исследователей в области программной инженерии (SYRCoSE 2014) в Санкт-Петербурге - ISBN 978-5-91474-020-4, c. 125-129

Anna-Maria Kriger, Kostroma State Technological University
Alyona Pochukalina, Obninsk Institute for Nuclear Power Engineering
Vladislav Isaev, Yuri Gagarin State Technical University of Saratov
Exactpro Systems



June 05, 2014


  1. Reconciliation Testing Aspects of Trading Systems Software Failures Anna-Maria Kriger,

    Kostroma State Technological University Alyona Pochukalina, Obninsk Institute for Nuclear Power Engineering Vladislav Isaev, Yuri Gagarin State Technical University of Saratov Exactpro Systems ANALYSIS OF IMAGES, SOCIAL NETWORKS, AND TEXTS April, 10-12th, Yekaterinburg
  2. Introduction • Our team is performing functional and non functional

    testing for electronic trading and post trade platforms • We are collecting information available about software outages and problems into our knowledge base to improve our test coverage and design more efficient ways to do our work • This paper describes the concept of reconciliation testing • We studied two major software disasters in financial services that led to substantial losses – Knight Capital and Facebook IPO • This paper focuses only on reconciliation controls and testing procedures relevant to these events • We plan to proceed with researching data reconciliation tools applicability in software testing and developing a reference implementation of a scalable real-time tool for reconciliation testing based on the proprietary market surveillance platform
  3. Data Reconciliation Reconciliation is a process of finding discrepancies in

    data obtained from different sources. In accounting, reconciliation refers to the process of ensuring that two sets of records, usually account balances, match each other. In the financial markets, data reconciliation systems help asset managers to reconcile trades, cash and security flows, balances and positions between different systems, e.g. internal data stored by the trading participant vs. external data received from counterparties, brokers, clearers, custodians, etc.
  4. Reconciliation Testing Reconciliation testing is a process of using data

    reconciliation tools to validate the system in parallel with other activities
  5. Data Reconciliation Tools The main aspects of data reconciliation tools

    are: •Passive test tools •Serve as test oracles •Can be used with HiVAT methods •Should be used during negative test cycles •Require extra resources
  6. Knight Capital Events • 1 August 2012, USA • Knight

    Capital – one of the most successful HFT firms • Implemented changes related to Retail Liquidity Program at NYSE • SMARS – an ultra-fast order router • Source code responsible for legacy functionality PowerPeg • 212 parent orders, millions child orders • Accumulated loss – $460m or $170k/sec • Incorrectly configured risk systems • Deployment on 7 servers instead of 8…
  7. Knight Capital Events

  8. Knight Capital Events

  9. Knight Capital Stock Chart

  10. Regulated Exchange

  11. Facebook IPO on NASDAQ • 18 May 2012, NASDAQ, One

    of the largest IPOs in history • Secondary trading is preceded by a designate Display Only Period (DOP) • Multi-component architecture that included Matching Engine, IPO Cross Application and Execution Application • At the end of the DOP, NASDAQ’s “IPO Cross Application” analyzes all of the buy and sell orders to determine the price at which the largest number of shares will trade; then NASDAQ’s matching engine matches the buy and sell orders at that price. Usually takes 1-2 ms • NASDAQ allowed orders to be cancelled at any time up until the end of the DOP, including the very brief interval during which the IPO cross price is calculated. After calculation was completed, the system performed orders validation check between ME and “IPO Cross Application”. If any of the orders were cancelled after the start of the cross, the system would have to repeat the calculation
  12. Facebook IPO on NASDAQ • Over 496k orders participated in

    the cross, and its duration exceeded 20ms • Order cancellation arrived during this period, and the application had to repeat the calculation. Two more cancellations arrived during the second iteration, and four more - during the third one • IPO Cross Application went into infinite loop at 11:05 • The NASDAQ team switched off validation check on the secondary system and performed failover 25 minutes after the start of the loop • Unknown at that moment, 38k orders submitted between 11:11 and 11:30 were stuck and did not participate in the uncross. It created another discrepancy, this time with Execution App and Members who were not able to receive confirmation for orders executed in the cross until 13:50
  13. Failover Proposal

  14. Failover Proposal

  15. Facebook IPO Stock Chart

  16. Comparison • Both Knight and NASDAQ had a set of

    reconciliation controls • In both cases they were not properly covered by testing: – Knight – reconciliation check stopped working years ago – NASDAQ – control was working, but operational procedure applicable to the case when it fails has not been tested • In both cases, monitoring systems notified operational team about a problem, however information provided was not enough to identify the source of the problem and react appropriately • In neither case a good balance between automatic and manual processing existed: – Some of Knight controls were not automatically connected to block real-time processing and prevent sending erroneous orders – On the opposite, the NASDAQ ones automatically halted processing, but the option to unblock the control wasn’t tested
  17. Market Surveillance System

  18. Market Surveillance System It is possible and beneficial to use

    market surveillance system as a reconciliation testing tool for the following reasons: •all required data is collected from the system and available both real-time and in the database; •most of surveillance systems are configured as a downstream component and do not affect the main transactional path; •rules engine allows creating data reconciliation checks and raise alerts when they fail; •order book replay allows studying the exact source of the discrepancy.
  19. Questions and Answers Thank you! We look forward to seeing

    you there!