Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Features and Platform MLEs

Features and Platform MLEs

In this talk, we go through the lessons learnt in the last couple of years around organising a Data Science Team and the Machine Learning Engineering efforts at Bumble Inc. How we saw arising different "engineering flavors" and which are their responsibilities in building, scaling and maintaining data/ML products at global scale.

Massimo Belloni

June 14, 2023
Tweet

More Decks by Massimo Belloni

Other Decks in Technology

Transcript

  1. Features & Platform MLEs A scalable and product-centric approach for

    high performing data products Massimo Belloni 14th June 2023
  2. 2 Data Science Manager @ Bumble Inc Massimo Belloni •

    Machine Learning Engineer (󰏢) - lived and worked in Rotterdam (󰐗) and London (󰏅) • Currently Data Science Manager at Bumble Inc for Integrity & Safety and MLOps Who am I? @massibelloni /massibelloni [email protected]
  3. 5 “Machine Learning Engineers are technically proficient programmers who research,

    build, and design self-running software to automate predictive models.“ Machine Learning Engineering “Operationalizing machine learning while building scalable, robust, and secure products in a commercial environment.” “ML engineers act as a bridge between data scientists who focus on statistical and model-building work and the construction of machine learning and AI systems.” “Foundational work about the reality of building machine learning models in production”
  4. Solution architecture for the inference service, support model development and

    training Understand the legacy systems and processes; features’ availabilities and collection Align Design What does a MLE actually do? 8
  5. Model’s deployment at scale, infrastructure management Solution architecture for the

    inference service, support model development and training Understand the legacy systems and processes; features’ availabilities and collection Deploy Align Design What does a MLE actually do? 9
  6. Model’s deployment at scale, infrastructure management Solution architecture for the

    inference service, support model development and training Understand the legacy systems and processes; features’ availabilities and collection Proactive and reactive monitoring and overall service improvement. Miscellaneous Support Deploy Align Design What does a MLE actually do? 10
  7. 12 Computing Resources Processing (CPU/GPU) and memory (RAM/GPU), agnostic to

    technology and provider Machine Learning Platform Infrastructural layer or components to abstract computing resources and enhance collaboration Platform and Features
  8. 13 Computing Resources Processing (CPU/GPU) and memory (RAM/GPU), agnostic to

    technology and provider Machine Learning Platform Infrastructural layer or components to abstract computing resources and enhance collaboration Service (Personalisation) Platform and Features Service (Safety) Service (Marketing)
  9. 14 Product Application layer, users, business metrics, downstream impact Computing

    Resources Processing (CPU/GPU) and memory (RAM/GPU), agnostic to technology and provider Machine Learning Platform Infrastructural layer or components to abstract computing resources and enhance collaboration Service (Personalisation) Platform and Features Service (Safety) Service (Marketing)
  10. 15 Machine Learning Platform Infrastructural layer or components to abstract

    computing resources and enhance collaboration Platform and Features
  11. 16 Machine Learning Platform - Foundations • Abstract access to

    computing resources in an easy, flexible and scalable way (not infinite, though 🙂) • Provide solid frameworks for performing common ML operations (eg. inference, training) • High performance, integrated and safe collaboration environment for analysis and model design
  12. 17 Machine Learning Platform - Frameworks • Centralise deployment and

    maintenance of high impact services and frameworks • Ensure best practices enabling and facilitating tools’ adoption • Unlock and support features’ work with new foundational capabilities
  13. 19 Abstract the technicalities and the pains in working with

    distributed GPU resources. Unlock scalability while aligning with broader company compute strategy. Design and enforce policies to allow fair and effective use of resources. Compute ML Platform Team - Objectives
  14. 20 Frameworks Abstract the technicalities and the pains in working

    with distributed GPU resources. Unlock scalability while aligning with broader company compute strategy. Design and enforce policies to allow fair and effective use of resources. Centralise research, deployment and maintenance of high impact frameworks. Anticipate needs and follow wider industry trends for efficient support. Compute ML Platform Team - Objectives
  15. 21 Frameworks Abstract the technicalities and the pains in working

    with distributed GPU resources. Unlock scalability while aligning with broader company compute strategy. Design and enforce policies to allow fair and effective use of resources. Centralise research, deployment and maintenance of high impact frameworks. Anticipate needs and follow wider industry trends for efficient support. Advocate, onboard and support for the adoption of tools and practices. Continuous improvement and balance between proactive and reactive work. Compute Alignment ML Platform Team - Objectives
  16. 24 Product Application layer, users, business metrics, downstream impact Computing

    Resources Processing (CPU/GPU) and memory (RAM/GPU), agnostic to technology and provider Machine Learning Platform Infrastructural layer or components to abstract computing resources and enhance collaboration Service (Personalisation) Platform and Features (again) Service (Safety) Service (Marketing)
  17. 26 Discovery Deeply understand use cases and how data flows.

    Build domain expertise in order to build meaningful PoC and reasonable definition of success. Features MLEs
  18. 27 Design Orchestrate model’s design process with strict acceptance criteria

    on features readiness and overall validation approach. Discovery Deeply understand use cases and how data flows. Build domain expertise in order to build meaningful PoC and reasonable definition of success. Features MLEs
  19. 28 Deploy Leverage platform knowledge and frameworks to deploy and

    maintain ML service. Align with best practices on monitoring, support and alerting. Design Orchestrate model’s design process with strict acceptance criteria on features readiness and overall validation approach. Discovery Deeply understand use cases and how data flows. Build domain expertise in order to build meaningful PoC and reasonable definition of success. Features MLEs
  20. 30 MLE Data Science Service Service Platform MLEs MLE MLS

    DS MLE Organizational Intuition MLS MLE DS
  21. 31 Platform and Features MLEs • Plenty of ways to

    be an Engineer in Machine Learning • ML Platforms’ challenges are mainly HW-related • Data products and “economies of scale”: solve efficiently and apply appropriately Wrapping up