Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevCoach 185: Machine Learning | Supervised Lea...

Avatar for Zahrina Zahrina
February 24, 2025
50

DevCoach 185: Machine Learning | Supervised Learning

Sesi ini akan membahas Sentimen Analisis sebagai bagian dari Natural Language Processing (NLP), mulai dari konsep dasar hingga implementasi praktis. Peserta akan diperkenalkan pada teknik NLP yang digunakan untuk memahami dan mengolah teks, termasuk tokenisasi, stemming, dan word embeddings. Selanjutnya, akan dijelaskan pendekatan pemodelan menggunakan Machine Learning tradisional seperti Naïve Bayes dan SVM, serta eksplorasi model Deep Learning seperti LSTM dan Transformer untuk meningkatkan akurasi analisis sentimen. Melalui sesi ini, peserta akan memperoleh pemahaman tentang bagaimana membangun pipeline Sentimen Analisis yang efektif dan aplikatif di berbagai industri.

Avatar for Zahrina

Zahrina

February 24, 2025
Tweet

Transcript

  1. Hi, I’m M. Fikry Rizal 👋 Latest Work Experiences: •

    AI/ML Curriculum Developer, Dicoding 2024 - present • Data Engineer Intern, Torche Education 2022 - 2023 Education: • UIN Syarif Hidayatullah Jakarta 2020 - 2024 Bachelor Degree, Physics • Bangkit Academy 2023 2023 Machine Learning About Me Muhammad Fikry Rizal https://github.com /mfikryrz Machine Learning
  2. Machine Learning Overview 1. Hi, Machine Learning! 2. Machine Learning

    Workflow 3. Supervised Learning: Klasifikasi 4. Supervised Learning: Regresi 5. Unsupervised Learning - Clustering 6. Teknik Feature Engineering 7. Overfitting dan Underfitting 8. Optimasi Model Machine Learning
  3. Hi, Machine Learning! 1. Hi, Machine Learning! 2. Machine Learning

    Workflow 3. Supervised Learning: Classification 4. Supervised Learning: Regression Machine Learning
  4. “A field of study that gives computers the ability to

    learn without being explicitly programmed.” Arthur Samuel Machine Learning
  5. Understanding Data: Sources and Formats • UC Irvine Machine Learning

    Repository • Kaggle Dataset • Google Dataset Search Engine • TensorFlow Dataset • Satu Data Indonesia • Menggunakan Dataset dari Sumber Terpilih Machine Learning • CSV Comma-Separated Values) • Excel Files • JSON JavaScript Object Notation) • HTML • SQL Database
  6. Exploratory & Explanatory Data Analysis • Exploratory Data Analysis EDA

    ◦ Understand structur, characteristics, and pattern in the data • Explanatory Data Analysis ExDA ◦ Communicating findings or insights that have been obtained to a broader audience. Machine Learning
  7. Refining Raw Data: The Art of Data Cleaning • Identifying

    and Handling Missing Values Machine Learning
  8. Refining Raw Data: The Art of Data Cleaning Machine Learning

    Feature Scaling • Normalization ◦ Features with Different Scales. ◦ Distance-Based Models. ◦ Data Not Normally Distributed. • Standardization ◦ Normally Distributed Data. ◦ Regression-Based Models. ◦ Data with Different Scales.
  9. Data Splitting Machine Learning • Training Set ◦ A subset

    of data used to train the model. ◦ Common Percentage: Typically 6080% of the total dataset. • Validation Set ◦ A subset of data used for validation during the training process. ◦ Common Percentage: Typically 1020% of the total dataset. • Test Set ◦ A subset of data used for final testing after the model has been trained and tuned. ◦ Common Percentage: Typically 1020% of the total dataset.
  10. Quiz #1 DevCoach 185 Dalam tahap preprocessing data, Anda menemukan

    bahwa beberapa fitur memiliki missing values. Pilihan berikut mana yang paling tepat untuk menangani missing values dalam konteks model machine learning? Machine Learning a). Menghapus fitur tersebut b). Mengisinya dengan mean atau median dari fitur tersebut
  11. Algorithm: Decision Tree Advantages • Can Handle Both Categorical and

    Numerical Data • No Need for Feature Scaling • Flexible and Customizable Disadvantages • Can Be Too Flexible Prone to Overfitting) • Sensitive to Noise • Can Grow Too Large Complex Trees) How It Works • Step 1 Initial Data Splitting • Step 2 Feature Selection and Data Partitioning • Step 3 Branch and Node Formation • Step 4 Creation of Leaf Nodes • Step 5 Using the Model for Prediction Machine Learning
  12. Algorithm: Random Forest Advantages • High Accuracy • Robust Against

    Overfitting • Ability to Handle Imbalanced Data • Handles Missing Data • Identifies Important Features Disadvantages • High Memory Requirements • Low Interpretability • Slow Prediction Speed • Less Effective on Small Datasets • Long Training Time Machine Learning
  13. Model Evaluation Misalkan kita memiliki hasil prediksi model untuk sebuah

    dataset dengan 100 email yang diklasifikasikan sebagai spam atau bukan spam (ham). Machine Learning
  14. Quiz #2 DevCoach 185 Seorang pasien menjalani tes untuk mendeteksi

    kanker. Hasil tes menunjukkan bahwa pasien tidak memiliki kanker, tetapi sebenarnya pasien tersebut memiliki kanker. Apa jenis kesalahan dalam klasifikasi ini? a) True Positive b) True Negative c) False Positive d) False Negative Machine Learning
  15. Quiz #3 DevCoach 185 Jika model menghasilkan hasil dalam bentuk

    persentase probabilitas, misalnya 85% kemungkinan hujan, apakah model tersebut lebih cenderung merupakan model klasifikasi atau regresi? Machine Learning