Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science Teams: Lessons Learned

Data Science Teams: Lessons Learned

A collection of quotes on people's experiences building and managing data science teams

Annabelle Rolland

May 09, 2016
Tweet

More Decks by Annabelle Rolland

Other Decks in Technology

Transcript

  1. Why is data important? “Big data is like teenage sex:

    everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…” (Dan Ariely, Duke University) “I’m a data janitor. That’s the sexiest job of the 21st century. It’s very flattering, but it’s also a little baffling” (Josh Willis, Cloudera) LOTS of hype around
  2. “A datum is a record of an action or event,

    which in most cases reflects a decision made by a person. If you can recreate the sequence of events leading up to that decision, you can learn from it; it’s an indirect way of the person telling you what they like and don’t like [...]. Data is the (aggregated) voice of our customers. And wherever we go next– wherever we belong next–will be driven by those voices.” (Riley Newman, Airbnb) Data is the voice of our customers:
  3. Data Science Teams: The Ideal Scenario The title Data Scientist

    is very broad and it is very difficult to find an individual who masters all the required skills (the infamous “Unicorn”) A data team should be a mix of people with complementary skills and different roles: Data Engineers, Business Intelligence analysts, Data Scientists, Technical Project Manager… “Before the data team is strong enough across all three areas, make sure they have strong support for the skills they lack, and don't expect them to work autonomously.” (Cheng-Tao Chu, Codecademy)
  4. The Data Science Team as a Product Team The data

    team should be built similarly to a product team to be able to execute on their ideas and be independent “Our solution was to make the data group a full product team responsible for designing, implementing, and maintaining products. As a product team, data scientists could experiment, build, and add value directly to the company.” (DJ Patil, Linkedin) The results: Who’s Viewed My Profile, Skills, Career Explorer, People You May Know….
  5. Challenges of Building Data Products “If you're not thinking about

    how to keep your data clean from the very beginning, you're fucked. I guarantee it.” (DJ Patil, Linkedin) •  Data is MESSY, data cleaning can easily represent 80% of the work. ◦  Data quality needs to be monitored on a constant basis •  Deciding how much and what data to expose to people ◦  Should not be creepy and user should feel in control ◦  Too much data can paralyze the user •  Data products take time to mature ◦  Time to collect the data that allows to make them better ◦  People You May Know took 2 years to drive reasonable growth
  6. Interactions with other teams Data Scientists should work closely with

    decision makers to ensure that their recommendations are being used “when data scientists are pressed for time, they have a tendency to toss the results of an analysis ‘over the wall’ and then move on to the next problem.[...] when decision-makers don’t understand the ramifications of an insight, they don’t act on it. When they don’t act on it, the value of the insight is lost.” (Riley Newman, Airbnb) Data Scientists should be embedded with cross-functional, feature-centric teams but “partnerships between scientists — allowing them to share best practices, ideas, and solutions with other people they enjoy working with — are what keep very talented people engaged and growing” (Gordon Rios, Pandora) Democratize data and empower other teams “Empowering teams is about removing the burden of reporting and basic data exploration from the shoulders of data scientists so they can focus on more impactful work.” (Riley Newman, Airbnb)
  7. Requests and Prioritisation “Interaction between the data science teams and

    the rest of corporate culture is another key factor. It’s easy for a data team (any team, really) to be bombarded by questions and requests. But not all requests are equally important. How do you make sure there’s time to think about the big questions and the big problems? How do you balance incoming requests (most of which are tagged “as soon as possible”) with long-term goals and projects? It’s important to have a culture of prioritization: everyone in the group needs to be able to ask about the priority of incoming requests. Everything can’t be urgent” (DJ Patil, Linkedin) Everything can’t be