Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Agile Data Science

Agile Data Science

Is Agile Data Science just two buzzwords put together? I argue that agile is a very practical and applicable methodology, that does work well in the real world for all sorts of Analytics and Data Science workflows. Here is what we've learned.

VolodymyrK

May 11, 2015
Tweet

More Decks by VolodymyrK

Other Decks in Business

Transcript

  1. 1 © All rights reserved to Agile Analytics in Mobile

    Gaming: lessons learned Volodymyr (Vlad) Kazantsev Head of Data Science at Product Madness 2015
  2. 4 Data Impact Team • Ad-hoc analytics and daily fires

    • Deep dive analysis; Predictive analytics • Data Engineering; R&D Team of 6
  3. 5 Few Examples A B A/B Tests Customer Lifetime Value

    days $ value Segmentation group 1 group 2 group 3 group 4
  4. 9 Agile Manifesto Individuals and interactions over processes and tools

    Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan * agilemanifesto.org
  5. 10 Agile Data Science Manifesto Individuals and interactions over processes

    and tools Actionable insights over comprehensive reports Customer collaboration over project negotiation Responding to change over following a plan
  6. 11 “If a building doesn’t encourage [collaboration], you’ll lose a

    lot of innovation and the magic that’s sparked by serendipity” - Steve Jobs Individuals and interactions over processes and tools
  7. 13 Agile Principles Iterative, incremental and evolutionary Efficient and face-to-face

    communication Very short feedback loop and adaptation cycle Quality focus - iterations, timeboxed estimates - no to tasks by email (with no face-to-face) - daily standups, pair analysis - verifiable, reproducible findings
  8. 15 Scrum-Ban in Data Science @ProductMadness • Weekly cycle •

    Daily standup meeting @10am • ToDo/WIP/Waiting buckets are kept small • Disruptions to weekly plan are expected • On-demand planning
  9. 16 Lesson 1: Agile methods in Data Science 1. co-location

    matter; whiteboard next to your desk 2. Work with decision maker; share preliminary findings 3. Make a research plan; pivot early 4. Book “Findings” meeting before project start 5. MVP for Data Products 6. Do Daily Stand-ups !
  10. 18 What is Agile Acceleration Waterfall Scrum Units of Work

    Time Interval Velocity = ΔVelocity = Acceleration* ΔTime VS.
  11. 19 a = F m I run SQL, copy- paste

    data to Excel and send it by email I created a deep neural network to predict high spenders
  12. 20 Case Study: to Git or not to Git Scripts

    (ruby, bash, python) Python Apps Python Modules IPython Notebooks Research Documents (word) Presentations (powerpoint) Spreadsheets (excel)
  13. 21 Case Study: Git or not to Git Scripts (ruby,

    bash, python) Python Apps Python Modules IPython Notebooks ? Research Documents (word) Slides (powerpoint) Spreadsheets (excel)
  14. 22 Case Study: Git or not to Git Scripts (ruby,

    bash, python) Python Apps Python Modules IPython Notebooks Research Documents (word) Slides (powerpoint) Spreadsheets (excel)
  15. 24 Lesson 2: find the lightest suitable tool 1. IPython

    notebooks: Dropbox over Git 2. Google Slides over Powerpoint Google Slides over Email with images (>2 images) 3. Google Spreadsheets over Excel (for analytics) 4. Podio over Jira (for analytics) 5. Data Transformations in DWH in SQL over Hadoop 6. Don’t copy-paste code in IPython notebooks; use functions; don’t copy-paste functions in notebooks, use modules
  16. 26 Analytics Loop Spot Opportunity Ask the Right Question Make

    Decision Improve the Business Data Science @work
  17. 28 Data Science Value Pyramid Store & Query Reports Descriptive

    Analytics Predictive Analytics Data Products * inspired by Agile Data Science, Russell Jurney, O'Reilly Media 2013 Record what Happened Was it good or bad? Why did it happen? What will happen? Affect the outcome complexity value
  18. 29 Data Science Value Loop Record what Happened Was it

    good or bad? Why did it happen? What will happen? Affect the outcome
  19. 30 Limit the number of Open Loops 90% 90% 75%

    80% 80% 60% 100% 100% 100% 100% 0% 0% Always prefer to have: 90% of tasks are 100% complete over 100% of tasks are 90% complete VS.
  20. 31 Lesson 3: Focus on Closing the Loop 1. Don’t

    build predictive models that you can’t act upon. Don’t analyse stuff that does not help to make a decision 2. The best way to deal with Analytics Spiral is to avoid the spiral. Practise Crack a Case and “what if” method. 3. Climb the Data Value Pyramid fast. Once climbed - optimise the Data Value Loop. 4. Limit the number of “open loops”
  21. 37 Import all commonly used tools in one line. All

    access and security is abstracted away. Focus on SQL, not data access formatting and publishing a .png in one line of code PyCharm has great SQL editor
  22. 38 Lesson 4: Reproducibility • Get rid of Windows and

    you get rid of Excel • ipynb are always shared and versioned; Prefer simple cloud sharing to VCS • Streamline data access functions • Cache long-running code and queries • Develop a common library
  23. 40 Summary • Agile approach works well in Data Science

    • Find the lightest suitable tool for a task • Reproducibility is not negotiable • Focus on closing the loop(s)