◦ Sales - The turnover for any given day (variable to be predicted) ◦ Customers - The number of customers on a given day ◦ Open - An indicator for whether the store was open: ◦ Promotion i. Promo - Indicates whether a store is running a promo on that day ii. Promo2 - Continuing and consecutive promotion for some stores: iii. Promo2Since [Year/Week] - Describes the year and calendar week when the store started participating in Promo2 iv. PromoInterval - Describes the consecutive intervals Promo2 is started, naming the months the promotion is started anew. • Holiday Information ◦ StateHoliday - Indicates a state holiday ◦ SchoolHoliday - Indicates if the (Store, Date) was affected by the closure of public schools • Store Information ◦ StoreID - Unique Id for each store ◦ Assortment - Describes an assortment level: ◦ StoreType - Differentiates between 4 different store models • Store Competitor information ◦ CompetitionDistance - Distance in meters to the nearest competitor store ◦ CompetitionOpenSince [Month/Year] - Approximate year and month of the time the nearest competitor was opened
yesterday ◦ last week ◦ ... • measures of centrality and measures of variability by ◦ last quarter, ◦ last half year, ◦ last year ◦ ... Recursive Prediction. It give me much worse result. but the 30th say it works,
days before, after or within the event.” -- Gert 我有抽取的: • 距離下個event還有多久,距離上個 event過了多久; • 目前state經過多久,剩多久結束, • state長度多久。 The features Temporal information
past data. 可是少了一個重點-- curent trends: • “I fit a store specific linear model on the day number - to extrapolate the trend into the six week period . “ -- winner Gert The features trends
“As for other interesting tricks, one was "payday loans" feature engineering. We noticed that if the month {28,29,30} is Monday, OR {28} day is either Thursday of Friday - there were evident increase in Sales. So one could reason that people are taking short-term loans before their paydays to buy stuff 。” 整修 觀察後猜測比賽給的資料並沒有標示出所有的 優惠活動 clearance and grand opening sale The features 一些比較有創意的feature
Holiday • Weather • Google Trends • 體育賽事 • 物價指數 • 股市 • 失業率 • and more…… 所以這個比賽資料無限,你的scraper大軍可以出動了。 問題是商店是匿名的,很多資料沒辦法用, 好在有人把商店的州名給還原,所以還是有很多外部資料可以連結 The features external data
加入RF 分數變差。 ◦ 加入luckyGB 分數變好。 • 重點還是feature,第一名說他最好的單一model就可以拿前三。 • 前幾名的model其實可以很精簡, 第四名 use weighted average of 6 GBT and a special post processing, the best single model has only hand-picked 22 features and can take 5th rank , c("WeekOfMonth","month","week","day","Store","Promo","DayOfWeek","year"," SchoolHoliday","CompDist0","CompOpenSince0","Promo2Since0","MeanLogSalesByStore"," MeanLogSalesByState","MeanLogSalesByStateHoliday","MeanLogSalesByAssortment"," MeanLogSalesByPromoInterval","MeanLogSalesByStorePromoDOW"," MeanLogCustByStorePromoDOW","MeanLogSalesBySchoolHoliday2Type"," Max_TemperatureC","SONNENSCHEINDAUER") Ensemble
* all features (different seeds) and 2 * all features (using month ahead features) •1 * sales model •1 * customer model •REPEAT all six models for months May to Septemper •For September, all of the 2*6 models used month ahead features