Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science : De la preuve de concept à la mis...

Data Science : De la preuve de concept à la mise en production par Mahzad Kalantari et Marie Degen

Une grande majorité de projets en data science restent au stade de la preuve de concept. Ce qui est bien dommage ! Nous allons partager notre retour d’expérience sur toutes les étapes qui nous ont permis de mettre en 6 mois un projet de data science en production.

Mahzad KALANTARI
Montréal / Ubisoft Montreal
Data:_*@ Ubisoft Montreal
Mahzad est data scientist et lead de l’équipe Data Ops au sein d' Ubisoft Connect. Elle a obtenu son doctorat en 2009 en vision par ordinateur. Son domaine d'expertise est le déploiement et la mise à l'échelle des algorithmes d'apprentissage automatique. Avant de rejoindre Ubisoft en 2019, Mahzad a travaillé pendant plus de 10 ans dans différents domaines liés à la data.

Marie Degen
Montréal / Ubisoft
Data Scientist @Ubisoft
Marie travaille en tant que Data Scientist dans l’équipe Data Ops d’Ubisoft Connect. Elle travaille sur divers projets de production tels que les recommandations d’amis, les recommandations de contenu et les recommandations de jeux. Son objectif principal dans l’équipe est d’explorer et de développer des modèles de machine learning, de maintenir le pipeline de production et de déployer des modèles scalable pour plusieurs millions de joueurs.

Women Techmakers Montreal

March 15, 2021
Tweet

More Decks by Women Techmakers Montreal

Other Decks in Technology

Transcript

  1. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Data Science: From proof of concept to production Mahzad Kalantari & Marie Degen
  2. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Marie has been working as a Data Scientist in the Data Ops Ubisoft Connect team. She is working on various production projects regarding recommender systems. Her focus in the team is to explore and develop machine learning models, maintain the production pipeline and deploy scalable models for millions of players. Mahzad is a Data Scientist and Lead of the Data Ops team at Ubisoft Connect. She obtained her Ph.D. in 2009 in the field of computer vision. Her area of expertise is the deployment and scaling up of data science and machine learning algorithms. Before joining Ubisoft in 2019 Mahzad has worked for more than 10 years in ABOUT US
  3. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - SUMMARY PART 1 Data Science Project : The Theory and The Reality PART 2 Machine Learning Pipeline in Production PART 3 CONCLUSION
  4. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - PART 1 Data Science Project : The Theory and The Reality
  5. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Data Science and ML Project Cycle Source : Data Science Project Life Cycle – Data Science Projects – Edureka
  6. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Answering a specific question • Be close to the business or team that will use your results • Get feedback throughout the process
  7. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Finding the right data and understand it! • Real data life is not a Kaggle challenge • Between 60% to 70% of the time on a project is dedicated to research, access, comprehension and data cleaning Do you know where I can find the data? Yes It’s here!
  8. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Architecture and Pipeline • Consider your architecture and pipeline from the beginning • Start with a simple architecture adapted to your needs • You can easily get lost in all the technologies on the market
  9. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Data Exploration and Analysis when I open my folder with a lot of notebooks! • Jupyter Notebook is very useful for data analysis and exploration Two common problems : • Versioning • Code review  Get your code into a clean project quickly and make versions with GIT  Start quick code reviews
  10. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Modelling • Start small think big! • If a linear regression can answer your problem start there, you can always improve it with deep learning for example! Or
  11. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Deployment • Classical software engineering steps (git, code review,..) • ML/Data OPS is very important part (model versioning, CI/CD, containers) • Monitoring the results and different jobs (KPIs definition, performance metrics) • Once your model is deployed it's not finished!
  12. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - PART 2 Machine learning pipeline in production
  13. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - PART 2 Machine learning pipeline in production
  14. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - A data storage place Computing resources A scheduler for different jobs An API to send the result to the client Machine learning project in production starter pack
  15. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Generic architecture One/Multiple data sources Jobs, Transformers Read data CI/CD generate a .jar Client Result send to the API Architecture made with
  16. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Example: Friends of Friends algorithm We produce ~20 billions recommendations for ~ 20 millions distinct profiles​ Suggested Friendship Data: up to date 2021/01/20
  17. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Some results • +1.7M add friend actions • ~40% acceptance rate
  18. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Classical steps of software engineering What I really do … What my mother thinks I do … • CI/CD (Gitlab) • Code review • Code Guidelines • Unit, Integration and Regression Tests • Versioning and Packaging • SonarQube
  19. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - PART 3 Conclusion
  20. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - Think about your architecture first, should be sturdy and sustainable through time. Focus yourself on given technologies. Start with simple models to get a baseline and then gradually experiment with more advanced models. Stay close to businesspeople or those who will use your data to get feedbacks throughout the process.
  21. Data Science: proof of concept to production WEBCAM PLACEHOLDER Marie

    Degen, Mahzad Kalantari - Data Scientist - THANK YOU