Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI/ML on Amazon SageMaker

AI/ML on Amazon SageMaker

Easily add AI to your next project



June 30, 2021

More Decks by Aletheia

Other Decks in Technology


  1. AI/ML on Amazon SageMaker 2021-06-30 Easily add AI to your

    next project
  2. Who am I? Luca Bianchi, PhD CTO @ Neosperience AWS

    Hero, passionate about serverless and machine learning github.com/aletheia https://it.linkedin.com/in/lucabianchipavia https://speakerdeck.com/aletheia www.ai4devs.io @bianchiluca
  3. AWS ML Stack

  4. The AWS machine learning stack Broadest and most complete set

    of Machine Learning capabilities
  5. how to start a new AI project?

  6. Real-Life Machine Learning Work f low

  7. Real-Life Machine Learning Work f low

  8. ML problem framing

  9. Real-Life Machine Learning Work f low

  10. Real-Life Machine Learning Work f low

  11. data collection and exploration

  12. In this phase, the business problem is framed as a

    machine learning problem: what is observed and what should be predicted (known as a label or target variable). Determining what to predict and how performance and error metrics need to be optimized is a key step in ML. For example, imagine a scenario where a manufacturing company wants to identify which products will maximize pro f its. Reaching this business goal partially depends on determining the right number of products to produce. In this scenario, you want to predict the future sales of the product, based on past and current sales. Predicting future sales becomes the problem to solve, and using ML is one approach that can be used to solve it. ML problem framing
  13. • De f ine criteria for a successful outcome of

    the project • Establish an observable and quanti f iable performance metric for the project, such as accuracy, prediction latency, or minimizing inventory value • Formulate the ML question in terms of inputs, desired outputs, and the performance metric to be optimized • Evaluate whether ML is a feasible and appropriate approach • Create a data sourcing and data annotation objective, and a strategy to achieve it • Start with a simple model that is easy to interpret, and which makes debugging more manageable ML problem framing
  14. • crawlers to retrieve data and schema discovery • maps

    data to databases and table that represent a logical data container • uses jobs to manage ETL tasks with support fort Apache Spark computational model • execution can be triggered on-demand or on a speci f ic schedule • supports continuous data ingestion through streaming jobs and data schema updates through dynamic data frames Managed ETL service with Spark support AWS Glue
  15. • Serverless. No ETL — Not having to set up

    and manage any servers or data warehouses. • Only pay for the data that is scanned. • You can ensure better performance by compressing, partitioning, and converting your data into columnar formats. • Can also handle complex analysis, including large joins, window functions, and arrays. • Athena automatically executes queries in parallel. • Need to provide a path to the S3 folder and when new f iles added automatically re f lects in the table. • Support CSV, Json, Parquet, ORC, Avro data formats • Complex Joins and datatypes • View creation Serverless data exploration tool Amazon Athena
  16. Start exploring our dataset Data collection

  17. Real-Life Machine Learning Work f low

  18. Real-Life Machine Learning Work f low

  19. A work f low management tool for data analysis and

    preparation SageMaker Data Wrangler
  20. Real-Life Machine Learning Work f low

  21. Real-Life Machine Learning Work f low

  22. “Data scientists are duplicating work because they don’t have a

    centralized feature store. Everybody I talk to really wants to build or even buy a feature store…… if an organization had a feature store, the ramp-up period for Data Scientists can be much faster.”
  23. • a fully managed repository to store, update, retrieve, and

    share machine learning (ML) features in S3. • online feature set to support inference tasks • Data Wrangler pushes engineered features into a feature store • both online and o ff line stores can be ingested via separate Engineering Pipeline via SDK • Streaming sources can directly ingest features to the online feature store for inference or feature creation • Feature Store automatically builds an Amazon Glue Data Catalog when Feature Groups are created Create, share, and manage features for machine learning (ML) development Amazon SageMaker FeatureStore
  24. Real-Life Machine Learning Work f low

  25. Real-Life Machine Learning Work f low

  26. • Add the training input and output data paths •

    Label to predict and enable the auto-deployment of the model • SageMaker deploys the best model and creates an endpoint after the successful training. LowCode Machine Learning SageMaker AutoPilot
  27. O ff load SageMaker tasks to external workers SageMaker Processing

  28. Real-Life Machine Learning Work f low

  29. Real-Life Machine Learning Work f low

  30. Execute parameter trials and compare results SageMaker Experiments

  31. • Track model metrics and send alerts when anomalies are

    detected • Pro f ile used resources to inspect anomalies • Provide built-in analytics • Trigger AWS 
 Lamda functions Optimize ML models with real-time monitoring of training metrics SageMaker Debugger and Model Monitor
  32. • Identify imbalances in data • Monitors Bias Drift for

    Models in Production • Provides components that help AWS customers build less biased and more understandable machine learning models • Provides explanations for individual predictions available via API • Helps in establishing the model governance for ML applications Provides support for explainable AI SagerMaker Clarify
  33. Real-Life Machine Learning Work f low

  34. Real-Life Machine Learning Work f low

  35. A complete platform for Machine Learning Wrap Up

  36. None
  37. thank you