Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scala For Business Automation

Scala For Business Automation

Solving Real Business Problems with Streams, POI, and a Dash of BDD

Avatar for conor10

conor10

May 27, 2015
Tweet

More Decks by conor10

Other Decks in Technology

Transcript

  1. Scala For Business Automation Solving Real Business Problems with Streams,

    POI and a Dash of BDD Conor Svensson conor10@gmail.com
  2. The London Whale …the model operated through a series of

    Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another… Report of JPMorgan Chase & Co. Management Task Force Regarding 2012 CIO Losses
  3. A brief history… • 1979 VisiCalc - The first killer

    app for PCs on Apple II • 1983 Lotus 1-2-3 • 1985 Excel 1.0 on Apple • 1987 Excel 2.0 on MS-DOS • 1992 Excel 3.0 on Windows • 1994 StarCalc (Star Office) • 2001 Calc (Open Office) • 2006 Google Spreadsheets • 2007 iWork Numbers
  4. Use cases • Traditional • Accounting/inventory • Project management •

    Simple data analysis • Reporting • Exotic • Business process workflows • Risk management • Transaction capture • Reconciliations
  5. Why? Initially… “I can solve this problem with a spreadsheet”

    • Avoid more expensive build via a proper solution (bureaucratic change processes don’t help here) • Lack of development knowledge And then… Management/audit/ regulators arrive… • “Why do we have these decentralised business critical spreadsheets?” • “We need a proper solution” Then… It becomes business critical
  6. Complex spreadsheets • Multiple lookup tables with reference data •

    Data from multiple data sources • Series of complex calculations • Output reports • Centralised via a file share (if lucky), multiple copies tend to exist due to e-mail
  7. 1. Apache POI • Poor Obfuscation Implementation • The Java

    API for Microsoft Documents • Work with Office files without a license (Excel, Word, PowerPoint, Outlook, Visio, Publisher) • Text extraction applications - web spiders, index builders, content management systems • Business automation…
  8. 2. Scala • Expressive and concise • Streams, for comprehensions,

    and case classes • Many organisations have not yet made the switch to Java 8 • Slots easily into existing Java dominant environments (e.g. financial institutions)
  9. Our model • Reorder data • Create named ranges •

    Merge price series by date • Calculate: 1. Returns of our price series 2. Mean 3. Standard deviation 4. Covariance 5. Correlation
  10. The Code ExcelReader • Use named ranges or sheet/row/col indices

    Stats • Statistical library Correlation • Incremental extraction of business logic with Streams/for-comprehensions
  11. Building on this • Rapid prototyping/MVP • Spreadsheet used as

    initial datasource • Micro-spreadsheet-services, etc… (sorry)
  12. Limitations • VBA evaluations are not supported, only formula evaluations

    are • Some Excel functions are not yet implemented (but you can contribute easily) • It’s not a silver bullet for avoiding spreadsheets, merely a tool to ease the pain of moving off them • Additional overhead of more granular stream processing
  13. In summary • Useful tools for rapid prototyping/MVP without risking

    data quality • Reduced implementation error • Spreadsheet can be modelled incrementally • BDD test cases verified against original source • Use FP techniques • Very well suited to streams • Promoting immutability of our data structures
  14. Further resources • https://github.com/conor10/spreadsheets - source code and example code

    supporting this talk • https://www.quandl.com - Quandl for data • https://poi.apache.org/ - Apache POI Project • http://scalatest.org/ - ScalaTest • Power, D. J., "A Brief History of Spreadsheets", DSSResources.COM, World Wide Web, http://dssresources.com/history/sshistory.html, version 3.6, 08/30/2004 • http://blogs.office.com/2012/09/13/introducing-spreadsheet-controls-in-the- new-office/ - Controls for critical spreadsheets • http://fsprojects.github.io/ExcelProvider/ - F# Excel Type Provider