Slide 1

Slide 1 text

Scala For Business Automation Solving Real Business Problems with Streams, POI and a Dash of BDD Conor Svensson [email protected]

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

The Spreadsheet The PC’s first killer app

Slide 5

Slide 5 text

The London Whale …the model operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another… Report of JPMorgan Chase & Co. Management Task Force Regarding 2012 CIO Losses

Slide 6

Slide 6 text

A brief history… • 1979 VisiCalc - The first killer app for PCs on Apple II • 1983 Lotus 1-2-3 • 1985 Excel 1.0 on Apple • 1987 Excel 2.0 on MS-DOS • 1992 Excel 3.0 on Windows • 1994 StarCalc (Star Office) • 2001 Calc (Open Office) • 2006 Google Spreadsheets • 2007 iWork Numbers

Slide 7

Slide 7 text

Today

Slide 8

Slide 8 text

Use cases • Traditional • Accounting/inventory • Project management • Simple data analysis • Reporting • Exotic • Business process workflows • Risk management • Transaction capture • Reconciliations

Slide 9

Slide 9 text

Why? Initially… “I can solve this problem with a spreadsheet” • Avoid more expensive build via a proper solution (bureaucratic change processes don’t help here) • Lack of development knowledge And then… Management/audit/ regulators arrive… • “Why do we have these decentralised business critical spreadsheets?” • “We need a proper solution” Then… It becomes business critical

Slide 10

Slide 10 text

Complex spreadsheets • Multiple lookup tables with reference data • Data from multiple data sources • Series of complex calculations • Output reports • Centralised via a file share (if lucky), multiple copies tend to exist due to e-mail

Slide 11

Slide 11 text

The result

Slide 12

Slide 12 text

Automation

Slide 13

Slide 13 text

1. Apache POI • Poor Obfuscation Implementation • The Java API for Microsoft Documents • Work with Office files without a license (Excel, Word, PowerPoint, Outlook, Visio, Publisher) • Text extraction applications - web spiders, index builders, content management systems • Business automation…

Slide 14

Slide 14 text

2. Scala • Expressive and concise • Streams, for comprehensions, and case classes • Many organisations have not yet made the switch to Java 8 • Slots easily into existing Java dominant environments (e.g. financial institutions)

Slide 15

Slide 15 text

Streams

Slide 16

Slide 16 text

For comprehension Instead of: We can use:

Slide 17

Slide 17 text

Case classes Scala provides: • field accessors • constructors • equals • hashCode • toString • copy

Slide 18

Slide 18 text

3. BDD in Scala Examples use ScalaTest’s FlatSpec with Matchers

Slide 19

Slide 19 text

An example Q: Is there a correlation between Bitcoin and commodity returns?

Slide 20

Slide 20 text

Source data • Bitcoin prices • ASX SPI 200 (Australian commodities index)

Slide 21

Slide 21 text

Our model • Reorder data • Create named ranges • Merge price series by date • Calculate: 1. Returns of our price series 2. Mean 3. Standard deviation 4. Covariance 5. Correlation

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

The Code ExcelReader • Use named ranges or sheet/row/col indices Stats • Statistical library Correlation • Incremental extraction of business logic with Streams/for-comprehensions

Slide 29

Slide 29 text

ExcelReader

Slide 30

Slide 30 text

Stats

Slide 31

Slide 31 text

Correlation

Slide 32

Slide 32 text

End to end validation Tests that all can understand!!!

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

Clear test output

Slide 35

Slide 35 text

Building on this • Rapid prototyping/MVP • Spreadsheet used as initial datasource • Micro-spreadsheet-services, etc… (sorry)

Slide 36

Slide 36 text

Limitations • VBA evaluations are not supported, only formula evaluations are • Some Excel functions are not yet implemented (but you can contribute easily) • It’s not a silver bullet for avoiding spreadsheets, merely a tool to ease the pain of moving off them • Additional overhead of more granular stream processing

Slide 37

Slide 37 text

In summary • Useful tools for rapid prototyping/MVP without risking data quality • Reduced implementation error • Spreadsheet can be modelled incrementally • BDD test cases verified against original source • Use FP techniques • Very well suited to streams • Promoting immutability of our data structures

Slide 38

Slide 38 text

Further resources • https://github.com/conor10/spreadsheets - source code and example code supporting this talk • https://www.quandl.com - Quandl for data • https://poi.apache.org/ - Apache POI Project • http://scalatest.org/ - ScalaTest • Power, D. J., "A Brief History of Spreadsheets", DSSResources.COM, World Wide Web, http://dssresources.com/history/sshistory.html, version 3.6, 08/30/2004 • http://blogs.office.com/2012/09/13/introducing-spreadsheet-controls-in-the- new-office/ - Controls for critical spreadsheets • http://fsprojects.github.io/ExcelProvider/ - F# Excel Type Provider

Slide 39

Slide 39 text

Questions?