Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science @ FAANG

Avatar for Ankit Sirmorya Ankit Sirmorya
June 07, 2024
28

Data Science @ FAANG

Avatar for Ankit Sirmorya

Ankit Sirmorya

June 07, 2024
Tweet

Transcript

  1. ML OPPORTUNITIES @ MAANG - Ankit Sirmorya, ML Engineering Manager

    @ Amazon - Disclaimer: All content in this presentation reflect my own views and do not represent my employer
  2. ABOUT ME • 9+ years of industry experience. Currently working

    as: • ML Engineering Manager Amazon • Alexa Voice Shopping • Amazon Devices • Advisor/Investor, Startups • Advisory Board Member, AI For Good • Technical Reviewer, Packt Publication • Educatio n LinkedI n • Masters in Machine Learning, University of Florida • Bachelor in Computer Science, NIT Raipur
  3. CULTURE • Data Science Archetypes • Applied Scientist : Uses

    advanced analytics technologies, including Machine Learning and Predictive Modeling to collect, analyze and interpret large amounts of data and produce actionable insights • Machine Learning Engineer: Runs various machine learning experiments using programming languages such as Python, Java, Scala, etc. with the appropriate machine learning libraries. • Some other career paths: NLP Scientist, Business Intelligence Engineer, ML Data Associate, Data lawyer, AI ethicist
  4. TYPES OF PROBLEMS • Starts with a customer problem and

    work backwards • Different types of problems depending upon the business domain • Increasing the clicks on recommended products • Create optimal product page for customers • Increase the read rate for the notifications sent to customers • Virtual Language Assistant • Iterative Process • Start with a heuristics based project • Create ML models for solving those business problems
  5. MACHINE LEARNING ENGINEER VS DATA SCIENTIST Data Scientist Machine Learning

    Engineer(MLE) Works more on the modeling side Tend to focus on the deployment of that model Focus on ins and outs of the ML algorithms Work to ship the model in production environment Skillsets: Python/R, Jupyter Notebook, SQL Skillsets: Python, Deployment Tools, MLOps Educational Background: Phd/ Masters in computer science Educational Background: Masters in computer science
  6. RESPONSIBILITIES OF A DATA SCIENTIST • Identifying relevant data sources

    for business needs/ Collecting structured and unstructured data • Sourcing missing data/ Organizing data in to usable formats • Building predictive models/ Building machine learning algorithms • Processing, cleansing & verifying of data/ Setting up data infrastructure • Analyzing data for trends and patterns and to find answers to specific questions • Develop, implement and maintain databases • Assess quality of data and remove or clean data • Generating information and insights from data sets and identifying trends and patterns • Preparing reports for executive and project teams • Create visualizations of data
  7. REQUIRED SKILLS TS FOR DATA SCIENTISTS • Fundamentals of Data

    Science: Difference between machine learning and deep learning, Common tools and terminologies, What is supervised and Unsupervised Learning, Classification vs regression problems • Statistics: Concept of descriptive statistics like mean, median, mode, variance, the standard deviation, various probability distributions, sample and population, CLT, skewness and kurtosis, inferential statistics • Programming knowledge: Python, R • Data Manipulation and Analysis: missing value imputation, outlier treatment, correcting data types, scaling, and transformation • Data Visualization: familiar with plots like Histogram, Bar charts, pie charts, waterfall charts, thermometer charts, etc. • Machine Learning: regression models, ensemble models, hyperparameter tuning • Deep Learning: models like DNN, CNN, RNN, and more. Libraries like T ensorFlow, Keras, and Py T orch • Big Data: Frameworks such as Hadoop, Spark, Apache Storm, and Flink, Hive • Model Deployment: SageMaker ML pipeline, Flask, etc. • Communication Skills/ Structured Thinking
  8. RESPONSIBILITIES OF A MLE • Designing ML systems • Researching

    and implementing ML algorithms and tools • Selecting appropriate data sets/ Picking appropriate data representation methods/ Verifying data quality • Identifying differences in data distribution that affects model performance • Transforming and converting data science prototypes • Performing statistical analysis • Running machine learning tests/ Using results to improve models • Training and retraining systems when needed • Extending machine learning libraries • Developing machine learning apps according to client requirements.
  9. SKILLSETS FOR MLE(REQUIRED) • Strong analytical, problem-solving and teamwork skills.

    • Software engineering skills. • Experience in data science. • Coding and programming languages, including Python, Java, C++, C, R and JavaScript. • Experience in working with ML frameworks. • Experience working with ML libraries and packages. • Understand data structures, data modeling and software architecture. • Knowledge in computer architecture.
  10. SKILLSETS FOR MLE(PREFERRED) • Advanced math and statistics skills, surrounding

    subjects such as linear algebra, calculus and Bayesian statistics. • Advanced degree in computer science, math, statistics or a related degree. • Master's degree in machine learning , neural networks, deep learning or related fields.
  11. HOW TO GET STARTED? • Basic topics you need to

    know for both MLE or Data Scientist roles • Must know for Data Scientist roles • Must know for MLE roles • Advanced topics
  12. UNDERSTAND MACHINE LEARNING ALGORITHMS Machine learning is about machine learning

    algorithms. You need to know what algorithms are available for a given problem, how they work, and how to get the most out of them. Here’s how to get started with machine learning algorithms: • Step 1: Discover the different types of machine learning algorithms. • A Tour of Machine Learning Algorithms • Step 2: Discover the foundations of machine learning algorithms. • How Machine Learning Algorithms Work • Parametric and Nonparametric Algorithms • Supervised and Unsupervised Algorithms • Step 3: Discover how top machine learning algorithms work. • Machine Learning Algorithms Mini-Course
  13. PYTHON MACHINE LEARNING Python is one of the fastest growing

    platforms for applied machine learning. You can use the same tools like pandas and scikit-learn in the development and operational deployment of your model. Below are the steps that you can use to get started with Python machine learning: • Step 1: Discover Python for machine learning • A Gentle Introduction to Scikit-Learn: A Python Machine Learning Library • Step 2: Discover the ecosystem for Python machine learning. • Crash Course in Python for Machine Learning Developers • Python Ecosystem for Machine Learning • Python is the Growing Platform for Applied Machine Learning • Step 3: Discover how to work through problems using machine learning in Python. • Your First Machine Learning Project in Python Step-By-Step • Python Machine Learning Mini-Course
  14. MATHEMATICS YOU NEED TO KNOW • Probability for Machine Learning

    • Discover what Probability is -> Basics of Mathematical Notation for Machine Learning • Dive into Probability topics -> Probability for Machine Learning Mini-Course • Statistics for Machine Learning • Discover what Statistical Methods are.What is Statistics (and why is it important in machine learning)? • Discover why Statistical Methods are important for machine learning.The Close Relationship Between Applied Statistics and Machine Learning • Linear Algebra for Machine Learning • Step 1: Discover what Linear Algebra is -> Basics of Mathematical Notation for Machine Learning • Step 2: Discover why Linear Algebra is important for machine learning -> 5 Reasons to Learn Linear Algebra for Machine Learning • Step 3: Dive into Linear Algebra topics(Vector, Matrix) -> Linear Algebra for Machine Learning Mini-Course
  15. OPTIMIZATION FOR MACHINE LEARNING You can get familiar with optimization

    for machine learning in 3 steps, fast. • Step 1: Discover what Optimization is. • A Gentle Introduction to Applied Machine Learning as a Search Problem • A Gentle Introduction to Function Optimization • Step 2: Discover the Optimization Algorithms. • Function Optimization With SciPy • Basin Hopping Optimization in Python • How to Implement Gradient Descent Optimization from Scratch • Step 3: Dive into Optimization Topics. • How to Manually Optimize Machine Learning Model Hyperparameters
  16. ENGINEERING CONCEPTS A MLE MUST KNOW This list is prepared

    using AWS as the underlying cloud compute framework, there are alternatives in Google Cloud or Azure • Sagemaker essentials •Launch training jobs within SageMaker •Deploy an endpoint that can perform inference on live data •Evaluate datasets with batch transform jobs. •Perform custom processing jobs on raw data • Designing Your Own Workflow • Create Lambda functions • Trigger Lambda functions utilizing both the SDK and other AWS Services • Design and execute a workflow utilizing State Machines • Learn about the use cases for SageMaker Pipelines
  17. ENGINEERING CONCEPTS A MLE MUST KNOW(CONTD.) • Monitoring a ML

    Workflow • Use SageMaker Feature Store to serve and monitor model data • Configure SageMaker Model Monitor to generate and track metrics about our models • Use Clarify to explain model predictions and surface biases in models
  18. RECOMMENDED COURSES & BOOKS • Courses • Data Scientist: Become

    a Data Scientist • MLE: AWS Machine Learning Engineer • Books • Ace the Data Science Interview • Machine Learning Mastery With Python
  19. Q&A