Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scala For Business Automation

Scala For Business Automation

Solving Real Business Problems with Streams, POI, and a Dash of BDD

conor10

May 27, 2015
Tweet

More Decks by conor10

Other Decks in Technology

Transcript

  1. The London Whale …the model operated through a series of

    Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another… Report of JPMorgan Chase & Co. Management Task Force Regarding 2012 CIO Losses
  2. A brief history… • 1979 VisiCalc - The first killer

    app for PCs on Apple II • 1983 Lotus 1-2-3 • 1985 Excel 1.0 on Apple • 1987 Excel 2.0 on MS-DOS • 1992 Excel 3.0 on Windows • 1994 StarCalc (Star Office) • 2001 Calc (Open Office) • 2006 Google Spreadsheets • 2007 iWork Numbers
  3. Use cases • Traditional • Accounting/inventory • Project management •

    Simple data analysis • Reporting • Exotic • Business process workflows • Risk management • Transaction capture • Reconciliations
  4. Why? Initially… “I can solve this problem with a spreadsheet”

    • Avoid more expensive build via a proper solution (bureaucratic change processes don’t help here) • Lack of development knowledge And then… Management/audit/ regulators arrive… • “Why do we have these decentralised business critical spreadsheets?” • “We need a proper solution” Then… It becomes business critical
  5. Complex spreadsheets • Multiple lookup tables with reference data •

    Data from multiple data sources • Series of complex calculations • Output reports • Centralised via a file share (if lucky), multiple copies tend to exist due to e-mail
  6. 1. Apache POI • Poor Obfuscation Implementation • The Java

    API for Microsoft Documents • Work with Office files without a license (Excel, Word, PowerPoint, Outlook, Visio, Publisher) • Text extraction applications - web spiders, index builders, content management systems • Business automation…
  7. 2. Scala • Expressive and concise • Streams, for comprehensions,

    and case classes • Many organisations have not yet made the switch to Java 8 • Slots easily into existing Java dominant environments (e.g. financial institutions)
  8. Our model • Reorder data • Create named ranges •

    Merge price series by date • Calculate: 1. Returns of our price series 2. Mean 3. Standard deviation 4. Covariance 5. Correlation
  9. The Code ExcelReader • Use named ranges or sheet/row/col indices

    Stats • Statistical library Correlation • Incremental extraction of business logic with Streams/for-comprehensions
  10. Building on this • Rapid prototyping/MVP • Spreadsheet used as

    initial datasource • Micro-spreadsheet-services, etc… (sorry)
  11. Limitations • VBA evaluations are not supported, only formula evaluations

    are • Some Excel functions are not yet implemented (but you can contribute easily) • It’s not a silver bullet for avoiding spreadsheets, merely a tool to ease the pain of moving off them • Additional overhead of more granular stream processing
  12. In summary • Useful tools for rapid prototyping/MVP without risking

    data quality • Reduced implementation error • Spreadsheet can be modelled incrementally • BDD test cases verified against original source • Use FP techniques • Very well suited to streams • Promoting immutability of our data structures
  13. Further resources • https://github.com/conor10/spreadsheets - source code and example code

    supporting this talk • https://www.quandl.com - Quandl for data • https://poi.apache.org/ - Apache POI Project • http://scalatest.org/ - ScalaTest • Power, D. J., "A Brief History of Spreadsheets", DSSResources.COM, World Wide Web, http://dssresources.com/history/sshistory.html, version 3.6, 08/30/2004 • http://blogs.office.com/2012/09/13/introducing-spreadsheet-controls-in-the- new-office/ - Controls for critical spreadsheets • http://fsprojects.github.io/ExcelProvider/ - F# Excel Type Provider