Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Breaking into Data Science

Iqbal Hanif
November 28, 2020

Breaking into Data Science

• Mengenalkan dasar-dasar tentang Data Science
• Bagaimana memulai karir sebagai Data Scientist
• Role dan jenjang karir sebagai Data Scientist
• Apa saja skil yang dibutuhkan untuk menjadi seorang Data Scientist
• Projek yang dikerjakan sebagai Data Scientist

Iqbal Hanif

November 28, 2020
Tweet

More Decks by Iqbal Hanif

Other Decks in Education

Transcript

  1. Berkarya Di Bidang Data Di Digital Telco Company Photo by

    Tomas Sobek on Unsplash.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  2. Hold On! Please Enter This Survey This document created by

    Iqbal Hanif. Not for sale & distribution without permission from its creator. Use this link: OR Visit menti.com and enter this code: 85 04 66 8 https://www.menti.com/bfa5vt4tjs
  3. Iqbal Hanif Institut Pertanian Bogor (2011 - 2015) Telkom Indonesia

    (2016 – now) Officer 2 / Junior Data Scientist (2020) Officer 3 Data Scientist (2017) Trainee - GPTP IV (2016) S1 Statistika, Minor: Ekonomi & Studi Pembangunan This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  4. Outline Challenges Use Cases Big Data Organization Structure Roles Products/Services

    Data Tools Machine Learning Type of Use Cases Competitions Working Style Apply for a Job Learn The Organization Growing Up Skills Icons made by Flat Icons from www.flaticon.com. This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  5. How We Work in Big Data Photo by Stephen Dawson

    on Unsplash.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  6. Big Data Organizations Big Data Platform Big Data Managament Big

    Data Analytics Membangun analytic as a service Bertanggung jawab mengelola big data platform Bertanggung jawab mengelola fungsi data acquisition, data integration, data mart, dan data mining Mengembangkan kapabilitas dan kapasitas big data platform Melakukan manajemen data quality, data security & governance Membangun model untuk peningkatan kualitas program internal dan eksternal Mengembangkan text, voice, dan video analytics This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator. Platform Engineer Data Engineer Data Scientist?
  7. Data Science Definition Data Science intends to analyze and understand

    actual phenomena with "data". In other words, the aim of data science is to reveal the features or the hidden structure of complicated natural, human and social phenomena with data with data from a different point of view from the established or traditional theory and method – Chikio Hayashi, The Institute of Statistical Mathematics Japan This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data - Shyue Ping Ong, UC San Diego Source: UC San Diego
  8. Data Science Roles Icons made by Euclayp, Becris, Mynamepong, and

    Catkuro from www.flaticon.com. This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator. • Knowledge of database • Ability to query data (SQL, NoSQL) • Ability to describe data (Trends, changes, etc) • Ability to use visualisation tools (Tableau, PBI) • Fluent at spreadsheet (Excel, sheet) • Ability to present data (Slides, Dashboards) • Business Acumen • Data Modeling (Statistical/Machine Learning) • Conduct Research (Statistics) • Experiment (A/B testing) • Extract insights, Tell Story • Programming (Python, R, ..) • Knowledge of database architecture • Knowledge of cloud platforms • Data pipeline (ETL) • Programming (Python, Scala, Java..) • State-of-the art machine learning models • Deep Learning • Computer Vision, NLP • Model deployment ++Statistics/Math Data Scientist ++Data Engineering Data Engineer ++Software Engineering Machine Learning Engineer ++Business Business Intelligence ++Product Product Analyst Data Analyst
  9. Type of Data Analytics Icons made by Flat Icons from

    www.flaticon.com.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator. Source: Gartner
  10. Big Data Analytics Skill Group Digital Skill Description Skill Set

    Data Analyst Menganalisa data menggunakan statistik serta meng-identifikasi korelasi atau pola yang terkandung di dalam data. Data Analytic, Data Engineering, Data Management, Data Visualization, Programming Language. Data Scientist Memproses data mulai dari pengumpulan, validasi, permodelan, hingga visualisasi. Big Data, Data Analytic, Data Engineering, Data Management, Data Visualization, Programming Language. Machine Learning Engineer Menerapkan algoritma-algoritma machine learning menjadi suatu sistem yang bisa diintegrasikan dengan sistem lain. Big Data, Data Management, Data Visualization, Machine Learning, Cloud Engineer, Programming, Computer Vision. AI Engineer Mengelola data yang besar dan menyusunnya menjadi kecerdasan buatan di dalam sebuah aplikasi. Big Data, Data Management, Data Visualization, Programming Language, Machine Learning, Robotic Framework. Database Administrator Mengimplementasikan prosedur keamanan untuk database, membersihkan serta mentansformasikan data. Data Engineering, Database Architecture, Algorithm & Data Structure, Programming Language. This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  11. Big Data Analytics Position Level Officer 3 Data Scientist Junior

    Data Scientist (Middle) Data Scientist Senior Data Scienist Principle Data Scientist This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator. Determined based on • Working Experience • Education Level • Technical Skills • Portfolio • etc.
  12. Big Data Analytics Products/Services CRM & Marketing Computer Vision Text,

    Web, and Social Media Operations & Performance Analytic as a Service Customer Relationship Management (CRM) Customer 360 & Segmentation Recommendation System Optical Character Recognition (OCR) Face Recognition Video Analytics Satellite Imagery Analytics Web Analytics Social Media Analytics Natural Language Processing (NLP) / Text Analytics Robotic Automation Process (RPA) Executive/Operational Dashboard Geographic Information System (GIS) Media Rating Risk Scoring Marketing Campaign This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  13. Type of Projects/ Use Cases Photo by Markus Spiske on

    Unsplash.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  14. Data Demography Purchased Products Billing/ Transactions Infrastructure /Asset Service Quality

    Users Behavior Icons made by Freepik from www.flaticon.com. This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  15. Tools Databases ETL Visualization Others Programming & Analytics This document

    created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  16. Machine Learning Algorithms This document created by Iqbal Hanif. Not

    for sale & distribution without permission from its creator. Source: github.com/trekhleb/homemade-machine-learning
  17. Types of Use Cases Classification Regression Clustering Statistical Analysis/Method Visualization

    This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  18. Classification Algorithms Risk Scoring Cross-sell & Up-sell Prospect Planning Age-Gender

    Prediction • Logistic Regression • Naïve Bayes • Random Forest • Gradient Boosting Steps • Data Pre-processing • Feature Selection & Engineering • Modeling (Train Set) • Evaluation (Test Set) Characteristics • Predict Label/Category • Mostly Binary Label • Evaluation Metrics: Accuracy, Sensitivity, ROC-AUC, F1-Score Icons made by Flat Icons from www.flaticon.com.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator. Churn Prevention
  19. Regression COVID-19 Prediction TV Rating Prediction Algorithms • ARIMA •

    FB Prophet • Random Forest • Extreme Gradient Boosting • SEIR Model Steps • Data Pre-processing • Feature Selection & Engineering • Modeling (Train Set) • Evaluation (Test Set) Characteristics • Predict Number / Continuous • Mostly Time Series • Evaluation Metrics: RMSE, MAE, MAPE Icons made by catkuro from www.flaticon.com.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  20. Clustering TV Audience Segmentation Customer Segmentation Algorithms • K-Means Clustering

    • K-Modes Clustering Steps • Data Pre-processing • Feature Selection & Engineering • Clustering (iterative) • Evaluation Characteristics • Generate Cluster for Non - Labelled Data • Evaluation Metrics: Elbow Method (WCSS), Silhouette Method Icons made by fjstudio from www.flaticon.com.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  21. Statistical Analysis/Method Sampling & Survey Campaign Evaluation Algorithms/Method • A/B

    Testing • Statistical Test (T-Test. ANOVA. Chi-Square) • Stratified Sampling • Clustering Sampling Characteristics • Comparing Values (statistically significant or not). • Creating suitable sampling frame for survey projects. Icons made by Freepik from www.flaticon.com.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator. Steps • Data Pre-processing • Do statistical testing or sampling based on those data
  22. Visualization Reporting Online Dashboard Algorithms/Method • Descriptive Chart (Pie. Bar,

    Line) • Statistical Chart (Histogram, Boxplot) Characteristics • Creating Real Time Monitoring Dashboard (Executive/Operational) • Creating report or presentation for meetings Icons made by Eucalyp from www.flaticon.com.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator. Steps • Data Pre-processing • Create Visualization • Publish (for online dashboard)
  23. Challenges to be Data Professionals Photo by Tim van der

    Kuip on Unsplash. This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  24. Challenges Competitions Working Style Apply for The Job Working Organization

    Growing Up Skills This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  25. Competitions 21% 44% 29% 6% Bachelor's Degree Master's Degree Ph.D.

    Others Education Level Field of Study Percentage Computer Science 3.053% Business Analytics 0.977% Physics 0.855% Information Technology 0.855% Statistics 0.733% Elctical Engineering 0.733% Applied Mathematics 0.733% Economics 0.611% Actuarial Science 0.611% Field of Study Percentage Business Analytics 4.274% Computer Science 3.175% Knowledge Engineering 2.930% Analytics 2.808% Statistics 2.564% Enterprise Business Analytics 2.564% Economics 0.733% Information Technology 0.611% Computer Engineering 0.611% Bachelor Master https://towardsdatascience.com/i-wasnt-getting-hired-as-a-data-scientist-so-i-sought-data-on-who-is-c59afd7d56f5 1. Try to be more outstanding, or Tips: Source: 2. Find jobs with specific education requirement. This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  26. Competitions This document created by Iqbal Hanif. Not for sale

    & distribution without permission from its creator.
  27. Working Style Routine Based Project Based • Stable Time of

    Work • Less Mobility • Consistent Type of Work • Intensive communication with peers. • Fluctuate Time of Work • More Mobility • Various Type of Work • Broad Connection 1. Find suitable job with suitable working style. Tips 2. Please be mind that not all of companies have flexible exit system. Icons made by Freepik from www.flaticon.com.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  28. Applying for A Job Administration / Document Selection Computer /

    Online Test Psychology Assessment Interview Medical Check Up 1. Prepare required documents (CV, TOEFL, Degree Cert., etc.) 2. Contact recruiters / PIC if have some questions about ducuments 3. Don’t Lie! Your document will be verified 1. Check the test schedule 2. Prepare computer and connection for the test 3. Study, but don’t forget to have enough rest 1. Check the assessment schedule and location (if offline) 2. Do benchmarking about psychology assessment 3. Take a good rest, it will be exhausting. 1. Check the interview schedule and location (if offline) 2. Guess the possible questions 3. Prepare the best introduction and closing statement (talk, gesture, etc.) 1. Check the medical check up schedule and location 2. Follow the instruction (e.g: last time to eat 3. Don’t forget to exercise and eat healthy food before the check up Icons made by Smashicons from www.flaticon.com.This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  29. Data Scientist Data Engineer Developer Tribe UI/UX Designer Platform Engineer

    Organizational Structure 1. Don’t hesitate to ask about the structure. Tips 3. Be polite! 2. Then, please ask the right person if you need any help This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  30. Growing Up Skills rank 2015 change in rank 2018 1Python

    [+1]Machine Learning 2Machine Learning [+2]Data Science 3R [ -2]Python 4Data Science [ -1]R 5Apache Spark [+5]SQL 6Data Mining [+3]Statistics 7Hadoop [new]Tableau 8Data Analysis [new]Data Visualization 9Statistics [new]NLP 10SQL [ -5]Apache Spark rank 2015 change in rank 2018 1Machine Learning [new]TensorFlow 2Python [ -1]Machine Learning 3Apache Spark [+1]Deep Learning 4Deep Learning [ new]Keras 5Algorithms [ -2]Apache Spark 6Java [new]NLP 7Big Data [new]Computer Vision 8Hadoop [ -6]Python 9Data Science [ -- ]Data Science 10C++ [ new]AWS Data Science Specialist Machine Learning Engineer http://www3.weforum.org/docs/WEF_Data_Science_In_the_New_Economy.pdf Source: 1. Don’t stop learning, find your own suitable learning platform Tips: 2. Try to implement your new skills in real case problem. This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  31. Want to Know More??? Photo by Bruce Mars on Unsplash.

    This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.
  32. You’re invited to join our event! This document created by

    Iqbal Hanif. Not for sale & distribution without permission from its creator. Follow us at instagram AND Visit our website www.dscsummit.com @dscsummittelkom
  33. That’s All My Contacts https://www.linkedin.com/in/iqbal-hanif-a7599662/ [email protected] My Articles/Writings My Portfolios

    https://www.researchgate.net/profile/Iqbal_Hanif https://medium.com/@iqbalhannif https://github.com/iqbalhanif https://speakerdeck.com/iqbalhanif Thank you & see you in Data Scientist Summit 2020 This document created by Iqbal Hanif. Not for sale & distribution without permission from its creator.