Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FRONT Compute Engine by DBS ITT Front Arena Team

Steve Teo
August 12, 2014

FRONT Compute Engine by DBS ITT Front Arena Team

DBS ITT Lunch and Learn Tech Sharing August 2014

Topic: FRONT Compute Engine
Speaker: DBS ITT Front Arena Team

For Front Arena system, we have a constant challenge of balancing various workload across various server hardware in order to turnover all reporting demands before the next business day starts or feed into down-stream systems before a stipulated cut-off. And, the workload demand is always increasing: Production PL, PL Explain, PL Sensitivity, Historical VaR PV Series, JTD, intra-day runs, re-runs, more portfolio, more Sensi measures. How do we address this challenge with one solution?

Steve Teo

August 12, 2014
Tweet

More Decks by Steve Teo

Other Decks in Technology

Transcript

  1. 5 Years Ago  160 TWS jobs on Trading Servers

     350 TWS jobs on Finance Servers Workload Production PL, PL Explain, Market Risk Sensi, JTD, Ops Position/Cash/Recon. Situation 1. TWS jobs are triggered on each of the Reporting Servers. 2. Workload are manually balanced, via TWS change. 3. Frequent re-run due to server resource contention and out-of- memory job failures.
  2. 3 Years Ago  180 TWS jobs on Trading Servers

     16 TWS jobs on Finance Servers Develop FaRptAgent/FaTransfer FaRptAgent process are triggered via TWS on Reporting Servers. Features 1. Centralized place for reports run parameters 2. Balance reports jobs accordingly to weight 3. Auto rerun once upon failure 4. Auto trigger of C:D file transfer
  3. 2 Years Ago  180 TWS jobs on Trading Servers

     24 TWS jobs on Finance Servers Develop FaRemote/FaCombine FaRptAgent process are centrally triggered via TWS on Trading Application Server, remotely run on various Reporting Servers. Features 1. Centralized remote job trigger 2. Auto combine of split jobs
  4. 1 Year Ago  Introduction of FA4 Risk API on

    Python  Consolidation of HsVaR PV Series into FA system
  5. The Task  To develop a smarter job scheduling solution

     Integrated well with Front Arena Python Extensions  Single solution that is usable for P&L, Sensi and HsVaR PV Series generation Why we need to do this?  Work-Life Balance  Productivity. Doing more with the same resource  To grow our technology skill sets
  6. FCE - Features  Dynamic Mapping of Jobs to Configurable

    Batch Size  Dynamic Scheduling of Jobs to Cluster of Compute Nodes  Ability to define multiple Compute Clusters  Fully Script-based, i.e. no code compilation  Web UI for Monitoring and Operations (Rerun, Prioritise, Ad-hoc Run  Zero Deployment of Source Codes on Compute Nodes  Auto Error Recovery  Ability to handle multiple payload/job type (API or Cmd file)
  7. FCE – Architecture  Engine based on Python  UI

    based on node.js  Payload or Computation Codes are dynamically distributed to Compute Nodes Head Node Compute Nodes FA Client FCE Node Payload(s) FCE Server Scheduler Cluster Pool Submit Monitor Service UI Server Node.js Bootstrap LDAP NET/EJS
  8. FCE – Algorithm  Engine based on Map-Reduce  Both

    Map and Reduce happens at Head Node  No data or code on Compute Node Map Payload Batch Batch Batch Schedule Reduce Result Result Result
  9. FCE - Effect  Eliminate use of TWS on Reporting

    Servers/Compute Nodes  Efficiency use of Server Resources (JIT Scheduling)  Eliminate use of Platform LSF Scheduler  Unified all Reporting Servers/Compute Nodes into a single resource pool  One solution for all reporting needs  Ease of Support