Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Marvin Platform – Potencializando equipes de Machine Learning

Marvin Platform – Potencializando equipes de Machine Learning

Apresentação de Lucas Bonatto Miguel e Daniel Takabayashi no Big Data Week São Paulo 2017 [http://sao-paulo.bigdataweek.com].

Marvin é um ambicioso projeto de código aberto que se concentra em ajudar equipes a entregar soluções de machine learning de maneira ágil. A plataforma oferece uma arquitetura padronizada e agnóstica de linguagem, de alta escala e baixa latência enquanto simplifica o processo de exploração e modelagem de projetos de IA.

Big Data Week São Paulo

October 21, 2017
Tweet

More Decks by Big Data Week São Paulo

Other Decks in Technology

Transcript

  1. B2W Digital: e-commerce leader in LatAm Source: 2016 Results from

    ri.b2w.digital Total GMV (R$) 12,458 MM Market share (%) 26,2% The Digital Platform that connects People, Businesses, Products and Services.
  2. Outline • Context • Data-driven culture • Artificial Intelligence •

    Domains of knowledge • Problem Statement • Marvin • Main components • Architecture • DASFE pattern • General features • Case • Roadmap
  3. Context: data-driven culture Single source of truth Data dictionary Broad

    data access Data literacy Decision making Why is it important to be data-driven?
  4. Context: Artificial Intelligence Machine Learning NLP Computer Vision • Buy

    Box optimization • Forecast demand • Fraud detection • Adspend optimization • Feature extraction from product description • Product category classification • Image matching to find associated products
  5. Problem statement How can we abstract the complexity in the

    creation of an AI application? Building AI projects is not a simple task. One is required to have advanced knowledge in different domains.
  6. Marvin Artificial Intelligence Platform Empowers data science teams to deliver

    AI applications, simplifying the process of exploitation and modeling.
  7. Marvin: main components ENGINE EXECUTOR ENGINE Data acquisitor Prediction preparator

    Predictor Training preparator Feedback Trainer TOOLBOX Evaluator
  8. Marvin: DASFE pattern Online Prediction Preparation Model Prediction Batch Data

    Acquisition & Cleaning Training Preparation Model Training Model Evaluation
  9. Marvin: DASFE pattern Online Prediction Feedback Online Prediction Preparation Model

    Prediction Batch Data Acquisition & Cleaning Training Preparation Model Training Model Evaluation
  10. Marvin: general features • Training pipeline REST interface • Experiment

    and artifacts versioning • Engine project scaffold generator • Data sampling and import CLI • Engine test framework (unit, functional, dryrun) • Toolbox: Python support • Artifacts persistence layer: HDFS support • Remote provisioning and deployment
  11. Case: risk analysis model • XGBoost in python • Dataset:

    1,0 M of orders • Training pipeline: 15 min • REST HTTP predictions: 15 ms • Load test: 100 rps w/ 15 ms mrt
  12. Case: how marvin helped the team? “... Jupyter notebook integration

    with Spark through Marvin’s toolbox lib was very helpful during prototyping phase...” “... the data importation utility speeds up data collection and sampling... ” “... it was easy to do feature engineering, feature selection and model choice using the DASFE model... ” “... we automated the training and deployment phase without having a dev/ops in our team...”
  13. Marvin: roadmap • Admin module • Toolbox: Java and Scala

    support • Feedback server • Artifacts persistence layer: S3 and local FS support • Remote provisioning and deployment: Azure, AWS and GCP • Customized notebook kernel • Automate feature engineering • Hyper parameters support • ML for no-data scientists • …
  14. Artificial Intelligence Platform Fork me on GitHub.com/marvin-ai and feel free

    to contribute! Thank you! @ GitHub.com/marvin-ai twitter.com/_marvin_ai [email protected]