everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…” (Dan Ariely, Duke University) “I’m a data janitor. That’s the sexiest job of the 21st century. It’s very flattering, but it’s also a little baffling” (Josh Willis, Cloudera) LOTS of hype around
which in most cases reflects a decision made by a person. If you can recreate the sequence of events leading up to that decision, you can learn from it; it’s an indirect way of the person telling you what they like and don’t like [...]. Data is the (aggregated) voice of our customers. And wherever we go next– wherever we belong next–will be driven by those voices.” (Riley Newman, Airbnb) Data is the voice of our customers:
is very broad and it is very difficult to find an individual who masters all the required skills (the infamous “Unicorn”) A data team should be a mix of people with complementary skills and different roles: Data Engineers, Business Intelligence analysts, Data Scientists, Technical Project Manager… “Before the data team is strong enough across all three areas, make sure they have strong support for the skills they lack, and don't expect them to work autonomously.” (Cheng-Tao Chu, Codecademy)
team should be built similarly to a product team to be able to execute on their ideas and be independent “Our solution was to make the data group a full product team responsible for designing, implementing, and maintaining products. As a product team, data scientists could experiment, build, and add value directly to the company.” (DJ Patil, Linkedin) The results: Who’s Viewed My Profile, Skills, Career Explorer, People You May Know….
how to keep your data clean from the very beginning, you're fucked. I guarantee it.” (DJ Patil, Linkedin) • Data is MESSY, data cleaning can easily represent 80% of the work. ◦ Data quality needs to be monitored on a constant basis • Deciding how much and what data to expose to people ◦ Should not be creepy and user should feel in control ◦ Too much data can paralyze the user • Data products take time to mature ◦ Time to collect the data that allows to make them better ◦ People You May Know took 2 years to drive reasonable growth
decision makers to ensure that their recommendations are being used “when data scientists are pressed for time, they have a tendency to toss the results of an analysis ‘over the wall’ and then move on to the next problem.[...] when decision-makers don’t understand the ramifications of an insight, they don’t act on it. When they don’t act on it, the value of the insight is lost.” (Riley Newman, Airbnb) Data Scientists should be embedded with cross-functional, feature-centric teams but “partnerships between scientists — allowing them to share best practices, ideas, and solutions with other people they enjoy working with — are what keep very talented people engaged and growing” (Gordon Rios, Pandora) Democratize data and empower other teams “Empowering teams is about removing the burden of reporting and basic data exploration from the shoulders of data scientists so they can focus on more impactful work.” (Riley Newman, Airbnb)
the rest of corporate culture is another key factor. It’s easy for a data team (any team, really) to be bombarded by questions and requests. But not all requests are equally important. How do you make sure there’s time to think about the big questions and the big problems? How do you balance incoming requests (most of which are tagged “as soon as possible”) with long-term goals and projects? It’s important to have a culture of prioritization: everyone in the group needs to be able to ask about the priority of incoming requests. Everything can’t be urgent” (DJ Patil, Linkedin) Everything can’t be