Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rakamin Data Science DigiFest - Homework

Rakamin Data Science DigiFest - Homework

This is my very first task with data

Agustina Sri Wardani

October 29, 2022
Tweet

More Decks by Agustina Sri Wardani

Other Decks in Education

Transcript

  1. A. Health Insurance Predict Health Insurance Owners' who will be

    interested in Vehicle Insurance B. Used Car Auction Prices Predict the price of a used car based on its specification and condition Dataset
  2. 1A. Health Insurance Cross Sell Prediction Problem: An insurance company

    with a health insurance product want to develop car assurance for their customer from last year. The company has many customers from last year, but not all customers want their car insurance. Business metrics: Customers who want to take the car insurance from the number of customers who get the car insurance offer. Solutions: We can use a model to predict who is interested in car insurance. With this model, we will know which customers want to take this car insurance, and the company can approach those customers to offer their new product, which is the car insurance
  3. 1B. Used Car Auction Prices Problem: An influencer's company wants

    to advance his business, which is a used car auction. There are many kinds of cars in this world, and of course, the specific car has the auctioneer's interest in the auction. Business metrics: The car’s features can make the auctioneers interested to buy the car Solutions: We can use a model to predict which feature is needed for a car so they can have the highest interest in the auction. With this model, we will know the feature that makes the car have the highest interest from the auctioneers’
  4. 2A. Health Insurance Cross Sell Prediction Variables used: - Driver’s

    license - Car insurance that customers already have - Age of the car - The car’s damage - Annual premi Data source: Online: Kaggle Offline: Request data from the company Data Understanding methods: - Descriptive analysis a. The age of the car and the car’s damage plot. We hope we can get insight into the correlation between the car’s age and the amount of car damage b. Bar plot to describe the distribution of the car insurance that customers already have
  5. 2A. Used Car Auction Prices Variabel yang digunakan: - The

    year of car production - Car brand - Car model - Car body - Car transmission - Car condition Sumber data: Online: Kaggle, survey Metode dalam Data Understanding: - Descriptive analysis: a. The year of car production & car body plot. We hope we can get insight into the correlation between the year of car production & car body b. Bar plot to describe the distribution of the used car sales from each car brand
  6. 3. Analysis Thinking 1. "Dirty" data will significantly affect the

    performance of the machine learning models we create. For example, if there is missing data, the model can't work when there are columns with null values. 2. If there is a null value, the data can be deleted if there are many null values in the column. If there are few null values in the column, we can modify null values so that the column doesn't need to be deleted. If there are duplicate data, then one data must be deleted, leaving one.