Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI/ML on Amazon SageMaker

AI/ML on Amazon SageMaker

Easily add AI to your next project

Aletheia

June 30, 2021
Tweet

More Decks by Aletheia

Other Decks in Technology

Transcript

  1. Who am I? Luca Bianchi, PhD CTO @ Neosperience AWS

    Hero, passionate about serverless and machine learning github.com/aletheia https://it.linkedin.com/in/lucabianchipavia https://speakerdeck.com/aletheia www.ai4devs.io @bianchiluca
  2. In this phase, the business problem is framed as a

    machine learning problem: what is observed and what should be predicted (known as a label or target variable). Determining what to predict and how performance and error metrics need to be optimized is a key step in ML. For example, imagine a scenario where a manufacturing company wants to identify which products will maximize pro f its. Reaching this business goal partially depends on determining the right number of products to produce. In this scenario, you want to predict the future sales of the product, based on past and current sales. Predicting future sales becomes the problem to solve, and using ML is one approach that can be used to solve it. ML problem framing
  3. • De f ine criteria for a successful outcome of

    the project • Establish an observable and quanti f iable performance metric for the project, such as accuracy, prediction latency, or minimizing inventory value • Formulate the ML question in terms of inputs, desired outputs, and the performance metric to be optimized • Evaluate whether ML is a feasible and appropriate approach • Create a data sourcing and data annotation objective, and a strategy to achieve it • Start with a simple model that is easy to interpret, and which makes debugging more manageable ML problem framing
  4. • crawlers to retrieve data and schema discovery • maps

    data to databases and table that represent a logical data container • uses jobs to manage ETL tasks with support fort Apache Spark computational model • execution can be triggered on-demand or on a speci f ic schedule • supports continuous data ingestion through streaming jobs and data schema updates through dynamic data frames Managed ETL service with Spark support AWS Glue
  5. • Serverless. No ETL — Not having to set up

    and manage any servers or data warehouses. • Only pay for the data that is scanned. • You can ensure better performance by compressing, partitioning, and converting your data into columnar formats. • Can also handle complex analysis, including large joins, window functions, and arrays. • Athena automatically executes queries in parallel. • Need to provide a path to the S3 folder and when new f iles added automatically re f lects in the table. • Support CSV, Json, Parquet, ORC, Avro data formats • Complex Joins and datatypes • View creation Serverless data exploration tool Amazon Athena
  6. A work f low management tool for data analysis and

    preparation SageMaker Data Wrangler
  7. “Data scientists are duplicating work because they don’t have a

    centralized feature store. Everybody I talk to really wants to build or even buy a feature store…… if an organization had a feature store, the ramp-up period for Data Scientists can be much faster.”
  8. • a fully managed repository to store, update, retrieve, and

    share machine learning (ML) features in S3. • online feature set to support inference tasks • Data Wrangler pushes engineered features into a feature store • both online and o ff line stores can be ingested via separate Engineering Pipeline via SDK • Streaming sources can directly ingest features to the online feature store for inference or feature creation • Feature Store automatically builds an Amazon Glue Data Catalog when Feature Groups are created Create, share, and manage features for machine learning (ML) development Amazon SageMaker FeatureStore
  9. • Add the training input and output data paths •

    Label to predict and enable the auto-deployment of the model • SageMaker deploys the best model and creates an endpoint after the successful training. LowCode Machine Learning SageMaker AutoPilot
  10. • Track model metrics and send alerts when anomalies are

    detected • Pro f ile used resources to inspect anomalies • Provide built-in analytics • Trigger AWS 
 Lamda functions Optimize ML models with real-time monitoring of training metrics SageMaker Debugger and Model Monitor
  11. • Identify imbalances in data • Monitors Bias Drift for

    Models in Production • Provides components that help AWS customers build less biased and more understandable machine learning models • Provides explanations for individual predictions available via API • Helps in establishing the model governance for ML applications Provides support for explainable AI SagerMaker Clarify