Introduction to Amazon Sagemaker - AWS Meetup

Amazon SageMaker a fully-managed service that provides every developer and
data scientist with the ability to build, train, and deploy machine learning models quickly

Agenda • Overview of Data Science • Machine Learning •
Amazon SageMaker • Intro to some basic Machine Learning Algorithms • Demo • Activity

What really is Data Science?

A Brief History of Data Science 1996 KDD, refers to
the overall process of discovering useful knowledge from data Data Mining 2001 William S. Cleveland Computer Science + Data Mining = Data Science Rise of Web 2.0 2003 - 2005 Myspace, Facebook, YouTube = interactive, shared experience, millions of users Big Data 2006 - 2009 Lots of data = Big Data; Parallel Computing Tech MapReduce, Hadoop, Spark Machine Learning 2010 Data-driven approaches rather than knowledge driven approach Data Science Teams Data Engineers Data Scientists Data Architects ML Researchers 2011 Data Scientist: The Sexiest Job of the 21st Century 2012 - Present Problem-Solver Strategist Complex problems; guide the company “Being a good Data Scientist is not about how ADVANCED your models are, it’s about how much IMPACT you can have with your work” -- A Data Scientist at Facebook

So.. Data Science is? almost everything that has something to
do with data: collecting, analyzing, modelling… yet the most important part is its applications—all sorts of applications”

Machine Learning Tom M. Mitchell (Formal Definition) • "A computer
program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.“ A subfield of artificial intelligence. • goal is to enable computers to learn on their own. • A machine’s learning algorithm enables it to identify patterns in observed data, build models that explain the world, and predict things without having explicit pre- programmed rules and models

Types of Machine Learning • Supervised Learning – Train Me!
• consider the learning is guided by a teacher • dataset acts as a teacher and its role is to train the model or the machine • can start making a prediction or decision when new data is given to it. • Unsupervised Learning – I am self sufficient in learning • learns through observation and finds structures in the data • automatically finds patterns and relationships in the dataset by creating clusters in it • what it cannot do is add labels to the cluster, like it cannot say this a group of apples or mangoes, but it will separate all the apples from mangoes • Reinforcement Learning – My life My rules! (Hit & Trial) • it is the ability of an agent to interact with the environment and find out what is the best outcome. It follows the concept of hit and trial method • agent is rewarded or penalized with a point for a correct or a wrong answer, and on the basis of the positive reward points gained the model trains itself

Machine Learning Workflow

Amazon Sagemaker Build • Collect & prepare training data •
Data labeling & pre-built notebooks for common problems • Choose & optimize your ML algorithm • Built-in, high-performance algorithms and hundreds of ready to use algorithms in AWS Marketplace Train • Set up & manage environments for training • One-click training on the highest performing infrastructure • Train & tune model • Train once, run anywhere & model optimization Deploy • Deploy model in production • One-click deployment • Scale & manage the production environment • Fully managed with auto-scaling for 75% less

Sentiment Analysis Sentiment Analysis inspects a text and determines if
the tone of that text is positive, negative, or neutral Common Use Cases: Track Customer Sentiment vs. Time Determine Which Customer Segments Have the Strongest Opinions Planning Product Improvements Determine the Most Effective Communication Channels Prioritize Customer Service Issues

Linear Learner Algorithm a supervised learning algorithm used for solving
either classification or regression problems Input Data: x is a high-dimensional vector y is a numeric label algorithm learns a linear function, or, for classification problems, a linear threshold function, and maps a vector x to an approximation of the label y

XGBoost (eXtreme Gradient Boosting) a popular and efficient open-source implementation
of the gradient boosted trees algorithm a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler, weaker models Famous for its flexibility and ability to robustly handle a variety of data types widely used in data science competitions like Kaggle

K-Nearest Neighbors a supervised non parametric algorithm (no assumptions about
the distribution of data) commonly used in classification or regression algorithm output is a class membership An object is assigned a class which is most common among its K nearest neighbors whereas; K = number of neighbors K is always > 0 (positive) Common use cases: Concept Searching (document searching) Recommender Systems

Process Flow

Activity

Introduction to Amazon Sagemaker - AWS Meetup

Introduction to Amazon Sagemaker - AWS Meetup

Kyle Escosia

More Decks by Kyle Escosia

Other Decks in Technology

Featured

Transcript

Amazon SageMaker a fully-managed service that provides every developer and

Agenda • Overview of Data Science • Machine Learning •

What really is Data Science?

A Brief History of Data Science 1996 KDD, refers to

So.. Data Science is? almost everything that has something to

Machine Learning Tom M. Mitchell (Formal Definition) • "A computer

Types of Machine Learning • Supervised Learning – Train Me!

Machine Learning Workflow

Amazon Sagemaker Build • Collect & prepare training data •

Sentiment Analysis Sentiment Analysis inspects a text and determines if

Linear Learner Algorithm a supervised learning algorithm used for solving

XGBoost (eXtreme Gradient Boosting) a popular and efficient open-source implementation

K-Nearest Neighbors a supervised non parametric algorithm (no assumptions about

Process Flow

Demo

Activity