Kaggle_meetup_3rd LT ( Sberbank Russian Housing Market )

Kaggle Tokyo Meetup #3 Lightning Talk Oct.28.2017 Maxwell_110

Predict House Price in Russian Housing Market Sberbank is a
Russian bank.

Messy data competition Many NA’s (An ocean of NA's) 01
Invalid data 02 Inaccurate Longitude and latitude . Macro-economic dependency 04 The price data is time-series data. We had to consider macro economic change with time. But data was limited… About 10% abnormal target values ( fake prices ) 03 In Russia, sometimes houses are sold at abnormal-lower price than natural one. Kaggler's voice He got 1st place. But being critical to this competition.

It’s tough to predict fake price. Because there will not
be any rule. About 10% of train data are Outliers... Histogram of house price 01 02 03 Exclude those or would harm model accuracy. There is no indicator for fakes. 04 How to exclude fake prices? Not simple outliers, but fake price! Price change with time mean price median price Histogram of logarithmic house price Logarithmic price change with time Price localizing

Actual Prediction Evaluation metric is RMSLE. If fake prices are
not included, it would be approximately 0.1XX based on my simulation. Test data would also include fake price, because scores on public LB are much higher than 0.1XX. Fake-free RMSLE: ~ 0.1XX Public LB RMSLE: ~ 0.3XX w/ fake price w/o fake price Prediction But w/o any cleansing, local CV was approximately 0.4XX, and LB was around 0.3XX. How about test data?

Step 1. Fit and predict prices by XGBoost with initial
train data. 1 2 3 4 Algorithm for excluding fake prices Step 2. Compare actual price and predicted price, then remove data where predicted price is largely deviated from actual. Throw away! Step 3. Again train the model with data after step2. Step 4. Predict price by XGBoost and go to step 2. If there would be no data to be removed, then stop this process.

https://www.datarobot.com/jp/AI-experience-tokyo- 2017/?utm_source=database&utm_campaign=JPDREXP Invitation of pre-seminar Date: Oct.8.2017, 18:30 – 21:00
Location: Tokyo Station Shin-Maru-building Join the private seminar of Datarobot! Legendary Kaggler And more...

Happy Kaggling!

Kaggle_meetup_3rd LT ( Sberbank Russian Housing...

Kaggle_meetup_3rd LT ( Sberbank Russian Housing Market )

Maxwell

More Decks by Maxwell

Other Decks in Technology

Featured

Transcript

Kaggle Tokyo Meetup #3 Lightning Talk Oct.28.2017 Maxwell_110

Predict House Price in Russian Housing Market Sberbank is a

Messy data competition Many NA’s (An ocean of NA's) 01

It’s tough to predict fake price. Because there will not

Actual Prediction Evaluation metric is RMSLE. If fake prices are

Step 1. Fit and predict prices by XGBoost with initial

https://www.datarobot.com/jp/AI-experience-tokyo- 2017/?utm_source=database&utm_campaign=JPDREXP Invitation of pre-seminar Date: Oct.8.2017, 18:30 – 21:00

Happy Kaggling!