Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Data-Intensive Teams

Elias
June 14, 2021

Building Data-Intensive Teams

Presented at Berlin Buzzwords 2021 https://2021.berlinbuzzwords.de/session/building-data-intensive-teams

Nowadays, users expect your app to be not only fast and reliable, but also smart. As a consequence, more and more teams are becoming data-intensive — relying on data to build their solutions. And it’s a common belief that putting models into production is one of the biggest bottlenecks in a journey of becoming more data-driven. While true, this step is only the beginning of the journey.

I believe that a much broader transformation is required in how we think about product development lifecycle as well as communication flows between business, engineering, and data.
In this talk, I’ll show how we are building data-grounded solutions in the domain of search and recommendations and instilling an experimental culture in one of the biggest online marketplaces.

Elias

June 14, 2021
Tweet

Other Decks in Technology

Transcript

  1. What is data-intensive? “We call an application data-intensive if data

    is it’s primary challenge — the quality of data, the complexity of data, or the speed at which it is changing — as opposed to compute-intensive, where CPU cycles are the bottleneck.” Martin Kleppmann
  2. How about teams? Data-intensive teams — teams for whom running

    data-intense applications and managing data products are the main challenges.
  3. A bit of history 1974, source 1976, source UNIX and

    System R are both "information management" systems
  4. A bit of history 1974, source 1976, source OS viewed

    its role as presenting hardware to computer programmers. DBMS views its role as managing data for application programmers.
  5. Two challenges with data products Generally, require a larger development

    investment with a higher uncertainty on the return on this investment.
  6. Costs Source: The Architectural Implications of Facebook’s DNN-based Personalized Recommendation

    Recommendation models in general comprise over 79% of AI inference cycles.
  7. Uncertainty of ROI Source: Trustworthy Online Controlled Experiments : A

    Practical Guide to A/B Testing Before After It’s hard to evaluate an idea
  8. Uncertainty of ROI Source: Trustworthy Online Controlled Experiments : A

    Practical Guide to A/B Testing > $100M/year Before After
  9. Typical development flow Implementation Idea • Bad for data products

    where success doesn’t only depend on the code/ system quality, and it’s hard to predict results from the initial project planning. • Optimizes for the wrong metric: often real feedback is only being collected at the end when the implementation is done.
  10. Why hypothesis? • Hypothesis: something that doesn’t have certainty, it

    can be right, it can be wrong • Idea: saying the idea is bad/good implies judgment and might hurt openness • User story: presumes we know our users Productionize Experiment Prototype Hypothesize ✅ 👩⚖ source
  11. Prototyping • Design your prototype to verify the hypothesis as

    fast as possible. • Decompose to the point where you spend minimum effort to test your assumptions. Productionize Experiment Prototype Hypothesize
  12. Experiment • Often we see code, systems, infrastructure as our

    medium. But it’s human interaction with the real products in the wild. So the biggest mistake is to sit in the ivory tower — go to the real world and collect feedback! • It’s not only about learnings, it’s about avoiding costly mistakes. Productionize Experiment Prototype Hypothesize
  13. Productionize • Having the results from the experiment and all

    other information you can now either go all-in on implementation or … go for another hypothesis! Productionize Experiment Prototype Hypothesize
  14. Hedge your risks Productionize Experiment Prototype Hypothesize 💰 💰 💰

    💰 💰 💰 Lowers the variance of the outcome — you “earn” by failing* * actually, by converging to the working solution faster
  15. Chameleon’s recap • Have a constant flow of hypothesis. •

    Minimize investment into your first prototype. • Collect feedback from real users. • Invest engineering effort into something that has proven its initial potential.
  16. Learning from football “We provide those [attacking] patterns and show

    them on video. We don’t practice them much, really, we almost exclusively concentrate on our defensive movements in training. Our attack arise from different patterns, at different times, against different opponents.”
  17. Learning from football Loosely translated as: To not be afraid

    to experiment in “attack” (exploring new opportunities) you need to have your “defense” solved (infrastructure/ops).
  18. Prototyping and experimenting • Start with a SQL query, but

    go through the full cycle (meaning get user feedback). • You’ll learn about many components that are missing in your system, will understand the priorities of those. • You’ll know what’s worth investment and what’s your “budget” for that. Experimentation / prototyping layer
  19. 10x thinking When you have 1 “data product (model)” in

    production, think about what will you need to run 10* of them smoothly. * 10 is a magical-empirical number, mentioned often in the industry; you want to build systems that can survive an order of a magnitude growth, more — is expensive, less — you’ll have to rewrite too often. 10x Foundational layer Experimentation / prototyping layer
  20. But don’t stop — experiment more … But you don’t

    need to stop and wait — continue experimenting and prototyping. 10x Foundational layer Experimentation / prototyping layer
  21. … and more … 10x 10x • You are ready

    to invest in another order of magnitude growth. • And again, this all comes from the real requirements, not “cargo culting”. Foundational layer Experimentation / prototyping layer
  22. Elephant’s recap • Infrastructure and operations are the most important

    things for innovation. • Build systems to survive at least an order of magnitude growth. • Build for your needs and when those needs arise.