Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Visual Content's Journey to Agile Data Science

Visual Content's Journey to Agile Data Science

How to apply agile software engineering methods to a team of data scientists. A field study from trivago's (former) images team.

Avatar for Pascal Cremer

Pascal Cremer

March 18, 2019
Tweet

More Decks by Pascal Cremer

Other Decks in Technology

Transcript

  1. Replace the main image on the item element on SEM

    Landing Pages for Spa & Wellness ("Spa Botox") and Pool & Beach ("Flamingo")
  2. Spa Botox 2.0: Validate the potential of contextual main images

    by leveraging our own custom tagging solution for images displayed on SEM Landing Pages for Spa & Wellness.
  3. How to not create value Soaking yourself in overly long

    research phases. Working backwards. Not documenting your results. And with all that: Not shipping anything.
  4. “The intrinsically probabilistic and non- deterministic characteristics of Data Science

    do not make it an easy or a natural fit for organizations accustomed to predominantly linear and reasonably predictable development models.” Source: Applying Agile to Data Science - Medium.
  5. So, is Data Science even a good fit for an

    agile (and cross-functional) team?
  6. “Our Data Scientists are using Scrum, too. They work on

    their own 8 story point tasks each sprint." Source: An actual Product Owner!
  7. Putting imaginary points to your JIRA tasks does not make

    you agile. That is not how that works!
  8. Agile is about empowering teams to deliver value with an

    emphasis on close collaboration between team members and business stakeholders using short feedback cycles.
  9. Agile Data Science In A Nutshell Embrace the MVP. Follow

    the Data Value Pyramid. Document everything! ⚗ Experiments, not tasks.
  10. Embrace the MVP The MVP in a Data Science initiative

    is whatever deliverable addresses some narrowly scoped business requirements using a minimal a set resources and tasks. "[An] MVP can simply be a predictive model that’s more accurate than random guessing." Source: Agile Development in Team Data Science - Wikibon Research.
  11. Document Everything Assets like tables, charts, reports, and predictions emerge

    as artifacts while iteratively climbing the Data Value Pyramid. And while they might not be "shippable" in software sense, they provide a strong basis for the dialog with business stakeholders.
  12. Experiments, Not Tasks While iterating, we want to achieve insights

    based on data, which can be best described as experiments. Experiments should be reproducible with all their artifacts documented.
  13. Embrace the MVP SEM Landing Page powered by Clairfai (3rd

    party) tags. MVP model with >80% accuracy on average for all classes.
 Training & validation set of 1000 images per class. Model with >80% overall accuracy for each class.
 Training & validation set of 2000 images per class.
  14. Experiments & Variants Experiments as counterpart to User Stories. They

    come with their own DoD like being reproducible (Jupyter Notebook checked into Github) and all artifacts documented inside a Dropbox Paper. Variants are related to "parent" experiments and can be run in parallel. Variations can differ from each other by their choice of model architecture, learning rate, batch size, loss function and other hyper parameters.
  15. One Agile Board to rule them all! Backlog Defined In

    Progress In Review Blocked Done!