Slide 1

Slide 1 text

1 © All rights reserved to Agile Analytics in Mobile Gaming: lessons learned Volodymyr (Vlad) Kazantsev Head of Data Science at Product Madness 2015

Slide 2

Slide 2 text

2 What we do?

Slide 3

Slide 3 text

3 Heart of Vegas in charts iPad rankings, US iPad rankings, Australia

Slide 4

Slide 4 text

4 Data Impact Team ● Ad-hoc analytics and daily fires ● Deep dive analysis; Predictive analytics ● Data Engineering; R&D Team of 6

Slide 5

Slide 5 text

5 Few Examples A B A/B Tests Customer Lifetime Value days $ value Segmentation group 1 group 2 group 3 group 4

Slide 6

Slide 6 text

6 Technology Stack C++ ETL orchestration Transformation & Aggregation SQL Data Products Reports Dashboards +

Slide 7

Slide 7 text

7 Lessons

Slide 8

Slide 8 text

8 Lesson 1: Agile Philosophy for Data Science 1

Slide 9

Slide 9 text

9 Agile Manifesto Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan * agilemanifesto.org

Slide 10

Slide 10 text

10 Agile Data Science Manifesto Individuals and interactions over processes and tools Actionable insights over comprehensive reports Customer collaboration over project negotiation Responding to change over following a plan

Slide 11

Slide 11 text

11 “If a building doesn’t encourage [collaboration], you’ll lose a lot of innovation and the magic that’s sparked by serendipity” - Steve Jobs Individuals and interactions over processes and tools

Slide 12

Slide 12 text

12 Individuals and interactions over processes and tools Standing Desks + Easily Available Whiteboard

Slide 13

Slide 13 text

13 Agile Principles Iterative, incremental and evolutionary Efficient and face-to-face communication Very short feedback loop and adaptation cycle Quality focus - iterations, timeboxed estimates - no to tasks by email (with no face-to-face) - daily standups, pair analysis - verifiable, reproducible findings

Slide 14

Slide 14 text

14 Data Science Board

Slide 15

Slide 15 text

15 Scrum-Ban in Data Science @ProductMadness ● Weekly cycle ● Daily standup meeting @10am ● ToDo/WIP/Waiting buckets are kept small ● Disruptions to weekly plan are expected ● On-demand planning

Slide 16

Slide 16 text

16 Lesson 1: Agile methods in Data Science 1. co-location matter; whiteboard next to your desk 2. Work with decision maker; share preliminary findings 3. Make a research plan; pivot early 4. Book “Findings” meeting before project start 5. MVP for Data Products 6. Do Daily Stand-ups !

Slide 17

Slide 17 text

17 Lesson 2: Agile Velocity vs. Acceleration 2

Slide 18

Slide 18 text

18 What is Agile Acceleration Waterfall Scrum Units of Work Time Interval Velocity = ΔVelocity = Acceleration* ΔTime VS.

Slide 19

Slide 19 text

19 a = F m I run SQL, copy- paste data to Excel and send it by email I created a deep neural network to predict high spenders

Slide 20

Slide 20 text

20 Case Study: to Git or not to Git Scripts (ruby, bash, python) Python Apps Python Modules IPython Notebooks Research Documents (word) Presentations (powerpoint) Spreadsheets (excel)

Slide 21

Slide 21 text

21 Case Study: Git or not to Git Scripts (ruby, bash, python) Python Apps Python Modules IPython Notebooks ? Research Documents (word) Slides (powerpoint) Spreadsheets (excel)

Slide 22

Slide 22 text

22 Case Study: Git or not to Git Scripts (ruby, bash, python) Python Apps Python Modules IPython Notebooks Research Documents (word) Slides (powerpoint) Spreadsheets (excel)

Slide 23

Slide 23 text

23 Remove unnecessary weight

Slide 24

Slide 24 text

24 Lesson 2: find the lightest suitable tool 1. IPython notebooks: Dropbox over Git 2. Google Slides over Powerpoint Google Slides over Email with images (>2 images) 3. Google Spreadsheets over Excel (for analytics) 4. Podio over Jira (for analytics) 5. Data Transformations in DWH in SQL over Hadoop 6. Don’t copy-paste code in IPython notebooks; use functions; don’t copy-paste functions in notebooks, use modules

Slide 25

Slide 25 text

25 Lesson 3: Focus on Closing the Loop 3

Slide 26

Slide 26 text

26 Analytics Loop Spot Opportunity Ask the Right Question Make Decision Improve the Business Data Science @work

Slide 27

Slide 27 text

27 Analytics Spiral Ideas & Questions Data Analysis Insights Impact

Slide 28

Slide 28 text

28 Data Science Value Pyramid Store & Query Reports Descriptive Analytics Predictive Analytics Data Products * inspired by Agile Data Science, Russell Jurney, O'Reilly Media 2013 Record what Happened Was it good or bad? Why did it happen? What will happen? Affect the outcome complexity value

Slide 29

Slide 29 text

29 Data Science Value Loop Record what Happened Was it good or bad? Why did it happen? What will happen? Affect the outcome

Slide 30

Slide 30 text

30 Limit the number of Open Loops 90% 90% 75% 80% 80% 60% 100% 100% 100% 100% 0% 0% Always prefer to have: 90% of tasks are 100% complete over 100% of tasks are 90% complete VS.

Slide 31

Slide 31 text

31 Lesson 3: Focus on Closing the Loop 1. Don’t build predictive models that you can’t act upon. Don’t analyse stuff that does not help to make a decision 2. The best way to deal with Analytics Spiral is to avoid the spiral. Practise Crack a Case and “what if” method. 3. Climb the Data Value Pyramid fast. Once climbed - optimise the Data Value Loop. 4. Limit the number of “open loops”

Slide 32

Slide 32 text

32 Lesson 4: Reproducibility Matters 4

Slide 33

Slide 33 text

33 To the and back!

Slide 34

Slide 34 text

34 Why? Boss: “Great! Can you run this for all monthly cohorts?” Because:

Slide 35

Slide 35 text

35 Why? Because: Boss: “Sam is on holiday. Can you re-run his analysis?”

Slide 36

Slide 36 text

36 Few IPython Tips

Slide 37

Slide 37 text

37 Import all commonly used tools in one line. All access and security is abstracted away. Focus on SQL, not data access formatting and publishing a .png in one line of code PyCharm has great SQL editor

Slide 38

Slide 38 text

38 Lesson 4: Reproducibility ● Get rid of Windows and you get rid of Excel ● ipynb are always shared and versioned; Prefer simple cloud sharing to VCS ● Streamline data access functions ● Cache long-running code and queries ● Develop a common library

Slide 39

Slide 39 text

39 In Summary...

Slide 40

Slide 40 text

40 Summary ● Agile approach works well in Data Science ● Find the lightest suitable tool for a task ● Reproducibility is not negotiable ● Focus on closing the loop(s)

Slide 41

Slide 41 text

41 Questions? We are Hiring ! volodymyrk [email protected]