Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Shipping ML at scale

Shipping ML at scale

According to TechRepublic, 85% of Machine Learning projects fail. Among the reasons that contribute to this scary statistic the most prominent are lack of leadership support, strategy or engineering skills. In the talk I’ll try to examine the main pain points, explaining best practices on how to overcome these challenges, bringing on the table real world examples, personal experiences and actual insights that allowed me and the teams I worked with to successfully deploy ML at scale and drive a real business impact. More than 15% of the time.

Massimo Belloni

July 14, 2021
Tweet

More Decks by Massimo Belloni

Other Decks in Technology

Transcript

  1. Shipping ML at scale (for real) Massimo Belloni, Senior Data

    Scientist @ Bumble [email protected] LinkedIn: in/massibelloni/ Medium: @massibelloni Lessons learnt and best practices
  2. Massimo Belloni Senior Data Scientist @ Bumble • MLOps, NLP

    and miscellaneous • Interests in philosophy of science, consciousness, Strong vs Weak AI ◦ Coding consciousness ◦ Interpretability and trust • MSc Computer Science and Engineering (Politecnico di Milano) private & confidential
  3. A lot of numbers, one outcome NEARLY CIOs have plans

    to develop AI solutions [1] 50% Data Science projects never make it to production [2] 87% AI solutions will provide erroneous results [1] 85% [1] Gartner [2] VentureBeat private & confidential
  4. Everyone wants to do machine learning, but doing it successfully

    is complex. Machine learning deployments require engineering skills, experimentation mindset and very clear and measurable goals. private & confidential
  5. The recipe for failure • Unclear goals and objectives Are

    we actually improving an existing business process in a measurable and quantifiable way? • Lack of engineering skills How far in the ML lifecycle are we able to go with internal resources? Are we able to communicate with other systems? • Poor experiments’ management Are all the experiments tracked and replicable? Can we trace back the history (datasets, code, parameters) of all the artifacts we are generating? private & confidential
  6. Process understanding and model’s design Deployment infrastructure and first PoC

    Model improvement and experimentation strategy private & confidential
  7. Process understanding and model’s design What are we optimising? Tracking

    and observability • Every Machine Learning project should aim to improve an existing business process. • The business impact and the teams involved have to be clear from the early steps. • Where the ROI stands? Which metrics are relevant? • Know the baseline performances of your process • Make sure to have in place monitoring pipelines before designing the model • ML metrics (precision, recall, etc) have to be associated with their business counterparts • Make sure to have reliable historical data about the process you want to optimise/automate • Is the same data available in production? Is the same data available in real time? Data quality and availability private & confidential
  8. Crucial step for successful ML deployments is becoming an expert

    of the processes we want to optimise. Preliminary exploratory analysis is an underestimated key step for success and expectations management. private & confidential
  9. Deployment infrastructure and first PoC (or: where everyone thinks is

    failing at) • It is easier than it looks! Deploying Machine Learning models isn’t inherently different from delivering any classical software engineering product. The vast majority of models can be deployed as a Python function inside a Flask application without any need for GPUs! (train != inference) • A philosophical topic: engineering skills in DS Having engineering skills and infrastructural knowledge is also important for designing the correct prediction pipeline and to ensure that all the necessary features are available in production. private & confidential
  10. CPU inferencing GPU inferencing single sample, multi-core optimise for big

    batches Service (Flask, FastAPI, Sanic) inference model requests arrive from multiple clients, usually one sample per request no batching applied on CPU, multiple inference engines (1 per CPU core) optimise thread usage (1 per core) Reference: Optimise BERT inference on CPU Service (TF Serving, TorchServe) inference model the service applies dynamic batching to optimise GPU usage batch size is GPU (hardware) and time constrained
  11. source Data Warehouse External sources Photo Storage ... query code

    dataset dataset_id artifacts Experiment model_id Model architecture Hyperparameters Validation strategy ... Model Features Metadata Performances experiment_id = dataset_id + model_id Experiments have to be easy and handy to compare Version datasets, not queries! private & confidential