Takanobu Nozawa
November 05, 2019
14k

# データ分析コンペにおいて 特徴量管理に疲弊している全人類に伝えたい想い

## Takanobu Nozawa

November 05, 2019

## Transcript

8. ### Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ w ಛ௃ྔ࡞Δ DPM<Z " # \$ %>ˠ<Z "

# \$ % & '> # e.g) train['A'] = train['A'].fillna(0) train['B'] = np.log1p(train['B']) train['E'] = train['A'] + train['B'] df_group = train.groupby('D')['E'].mean() train['F'] = train['D'].map(df_group) <> ɾ ɾ ɾ
9. ### Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ w ࢖͏ಛ௃ྔͷΧϥϜ͚ͩࢦఆ͢Δ # e.g) feat_col = ['A', 'C',

'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) w ֶशͤ͞Δ <> <> ɾ ɾ ɾ
10. ### Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ w ࢖͏ಛ௃ྔͷΧϥϜ͚ͩࢦఆ͢Δ # e.g) feat_col = ['A', 'C',

'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) w ֶशͤ͞Δ <> <> ɾ ɾ ɾ ͋Ε b'`ͬͯͲΜͳಛ௃ྔ͚ͩͬʁ
11. ### Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ # e.g) train['A'] = train['A'].fillna(0) train['B'] = np.log1p(train['B'])

train['E'] = train['A'] + train['B'] df_group = train.groupby('D')['E'].mean() train['F'] = train['D'].map(df_group) <> # e.g) feat_col = ['A', 'C', 'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) <> <>
12. ### Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ # e.g) train['A'] = train['A'].fillna(0) train['B'] = np.log1p(train['B'])

train['E'] = train['A'] + train['B'] df_group = train.groupby('D')['E'].mean() train['F'] = train['D'].map(df_group) <> # e.g) feat_col = ['A', 'C', 'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) <> <> ݟ͚ͭͨʂ ʢOPUFCPPLͷ্ͷํʣ
13. ### Α͋͘ΔύλʔϯʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ # e.g) train['A'] = train['A'].fillna(0) train['B'] = np.log1p(train['B'])

train['E'] = train['A'] + train['B'] df_group = train.groupby('D')['E'].mean() train['F'] = train['D'].map(df_group) <> # e.g) feat_col = ['A', 'C', 'D', 'E', 'F', 'J'] x_train = train[feat_col] y_train = train['y'] # e.g) clf.fit(x_train, y_train) <> <> ݟ͚ͭͨʂ ʢOPUFCPPLͷ্ͷํʣ ಛ௃ྔ͕গͳ͍৔߹͸·ͩϚγ͕ͩɺ ଟ͘ͳͬͯ͘ΔͱͲΜͳܭࢉͰٻΊͨ ಛ௃ྔ͔ͩͬͨΛ͍͍ͪͪߟ͑Δʢ୳͢ʣ ͷ͸݁ߏେมͩ͠ɺ͕͔͔࣌ؒΔ

17. ### Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ <> import numpy as np import pandas as

pd OPUFCPPLͷத਎ ɾ ɾ ɾ <> submission.to_csv('submission.csv', index=False)
18. ### Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ <> import numpy as np import pandas as

pd OPUFCPPLͷத਎ ɾ ɾ ɾ ɾ ɾ ɾ <> submission.to_csv('submission.csv', index=False)
19. ### Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ <> import numpy as np import pandas as

pd OPUFCPPLͷத਎ ɾ ɾ ɾ ɾ ɾ ɾ <> submission.to_csv('submission.csv', index=False)
20. ### Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ <> import numpy as np import pandas as

pd OPUFCPPLͷத਎ ɾ ɾ ɾ ɾ ɾ ɾ <> submission.to_csv('submission.csv', index=False) ಉ͡ܭࢉΛԿ౓΋΍Βͳ͍ͱ͍͚ͳ͍ ʴ ºʢ ʣˠແବ
21. ### Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ ౓ॏͳΔ%VQMJDBUFʹΑΓɺOPUFCPPL஍ࠈʹؕΔՄೳੑ΋ʜ dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC

dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC ʜʜʜʜʜ
22. ### Α͋͘ΔύλʔϯͦͷʢJQZOCʣ ϚϚͷҰาΛࢧ͑Δ ౓ॏͳΔ%VQMJDBUFʹΑΓɺOPUFCPPL஍ࠈʹؕΔՄೳੑ΋ʜ dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC

dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC dOPUFCPPL-JHIU#(.@TDPSF@JQZOC ʜʜʜʜʜ

25. ### ΞδΣϯμ  ࣗݾ঺հ  ಛ௃ྔ؅ཧʹ͍ͭͯ ‣ ྻ͝ͱʹQJDLMFϑΝΠϧͰಛ௃ྔΛ؅ཧ ‣ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹϝϞϑΝΠϧ΋ੜ੒ 

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ‣ ίϚϯυҰൃͰֶशˠ4VCNJUϑΝΠϧ࡞੒·ͰΛ࣮ߦ ‣ ֶशʹ࢖༻ͨ͠ಛ௃ྔ΍Ϟσϧύϥϝʔλ͸MPHͱҰॹʹอଘ ‣ TIBQΛ༻͍ͯಛ௃ྔͷߩݙ౓ΛՄࢹԽ͠ɺ࣍ճֶश࣌ͷצॴΛݟ͚ͭΔ ϚϚͷҰาΛࢧ͑Δ
26. ### ΞδΣϯμ  ࣗݾ঺հ  ಛ௃ྔ؅ཧʹ͍ͭͯ ‣ ྻ͝ͱʹQJDLMFϑΝΠϧͰಛ௃ྔΛ؅ཧ ‣ ಛ௃ྔੜ੒࣌ɺಉ࣌ʹϝϞϑΝΠϧ΋ੜ੒ 

ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ‣ ίϚϯυҰൃͰֶशˠ4VCNJUϑΝΠϧ࡞੒·ͰΛ࣮ߦ ‣ ֶशʹ࢖༻ͨ͠ಛ௃ྔ΍Ϟσϧύϥϝʔλ͸MPHͱҰॹʹอଘ ‣ TIBQΛ༻͍ͯಛ௃ྔͷߩݙ౓ΛՄࢹԽ͠ɺ࣍ճֶश࣌ͷצॴΛݟ͚ͭΔ ϚϚͷҰาΛࢧ͑Δ ݰਓͷ஌ܙΛ͓आΓͨ͠Β ΊͬͪΌΑ͔ͬͨ ʢ˞ʣ ͍ͬͯ͏࿩Λ͠·͢ ʢ˞ʣ͋͘·Ͱओ؍Ͱ͢

28. ### ࣗݾ঺հ ϚϚͷҰาΛࢧ͑Δ ໊લɿ໺ᖒ఩রʢ/P[BXB5BLBOPCVʣ ॴଐɿίωώτגࣜձࣾ ɹɹɿ͔ͨͺ͍!UBLBQZ w ʙίωώτʹ.-ΤϯδχΞͱͯ͠+0*/ w ػցֶशʢ/-1ɺਪનγεςϜʣΛϝΠϯʹ΍ΓͭͭΠϯϑϥʢ"84ʣ΋ษڧத w

,BHHMFͨ͠ΓɺϒϩάʢIUUQTXXXUBLBQZXPSLʣॻ͍ͨΓɺ໺ٿͨ͠Γɺ ϥʔϝϯ৯΂ͨΓ͍ͯ͠·͢

32. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ 4VSWJWFE 1DMBTT 4FY "HF &NCBSLFE   NBMF

 4   NBMF  \$   GFNBMF  \$   NBMF  \$   GFNBMF  \$   GFNBMF  4   NBMF  4 lྻ͝ͱzʹಛ௃ྔΛQJDLMFϑΝΠϧͰ؅ཧ͢Δ
33. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ 4VSWJWFE 1DMBTT 4FY "HF &NCBSLFE   NBMF

 4   NBMF  \$   GFNBMF  \$   NBMF  \$   GFNBMF  \$   GFNBMF  4   NBMF  4 TVSWJWFE@USBJOQLM QDMBTT@USBJOQLM QDMBTT@UFTUQLM TFY@USBJOQLM TFY@UFTUQLM BHF@USBJOQLM BHF@UFTUQLM FNCBSLFE@USBJOQLM FNCBSLFE@UFTUQLM lྻ͝ͱzʹಛ௃ྔΛQJDLMFϑΝΠϧͰ؅ཧ͢Δ

38. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ IPHFQZΛίϚϯυϥΠϯ͔Β࣮ߦ͢Δ͚ͩ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass']

self.test['Pclass'] = test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') class Sex(Feature): def create_features(self): self.train['Sex'] = train['Sex'] self.test['Sex'] = test['Sex'] create_memo('Sex','ੑผ') class Age(Feature): def create_features(self): self.train['Age'] = train['Age'] self.test['Age'] = test['Age'] create_memo('Age','೥ྸ') class Age_mis_val_median(Feature): def create_features(self): self.train['Age_mis_val_median'] = train['Age'].fillna(train['Age'].median()) self.test['Age_mis_val_median'] = test['Age'].fillna(test['Age'].median()) create_memo('Age_mis_val_median','೥ྸͷܽଛ஋Λதԝ஋Ͱิ׬ͨ͠΋ͷ')
39. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass']

= test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') class Sex(Feature): def create_features(self): self.train['Sex'] = train['Sex'] self.test['Sex'] = test['Sex'] create_memo('Sex','ੑผ') class Age(Feature): def create_features(self): self.train['Age'] = train['Age'] self.test['Age'] = test['Age'] create_memo('Age','೥ྸ') class Age_mis_val_median(Feature): def create_features(self): self.train['Age_mis_val_median'] = train['Age'].fillna(train['Age'].median()) self.test['Age_mis_val_median'] = test['Age'].fillna(test['Age'].median()) create_memo('Age_mis_val_median','೥ྸͷܽଛ஋Λதԝ஋Ͱิ׬ͨ͠΋ͷ') IPHFQZΛίϚϯυϥΠϯ͔Β࣮ߦ͢Δ͚ͩ
40. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass']

= test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') class Sex(Feature): def create_features(self): self.train['Sex'] = train['Sex'] self.test['Sex'] = test['Sex'] create_memo('Sex','ੑผ') class Age(Feature): def create_features(self): self.train['Age'] = train['Age'] self.test['Age'] = test['Age'] create_memo('Age','೥ྸ') class Age_mis_val_median(Feature): def create_features(self): self.train['Age_mis_val_median'] = train['Age'].fillna(train['Age'].median()) self.test['Age_mis_val_median'] = test['Age'].fillna(test['Age'].median()) create_memo('Age_mis_val_median','೥ྸͷܽଛ஋Λதԝ஋Ͱิ׬ͨ͠΋ͷ') ֤ಛ௃ྔ IPHFQZΛίϚϯυϥΠϯ͔Β࣮ߦ͢Δ͚ͩ
41. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass']

= test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') class Sex(Feature): def create_features(self): self.train['Sex'] = train['Sex'] self.test['Sex'] = test['Sex'] create_memo('Sex','ੑผ') class Age(Feature): def create_features(self): self.train['Age'] = train['Age'] self.test['Age'] = test['Age'] create_memo('Age','೥ྸ') class Age_mis_val_median(Feature): def create_features(self): self.train['Age_mis_val_median'] = train['Age'].fillna(train['Age'].median()) self.test['Age_mis_val_median'] = test['Age'].fillna(test['Age'].median()) create_memo('Age_mis_val_median','೥ྸͷܽଛ஋Λதԝ஋Ͱิ׬ͨ͠΋ͷ') ಛ௃ྔϝϞϑΝΠϧ IPHFQZΛίϚϯυϥΠϯ͔Β࣮ߦ͢Δ͚ͩ
42. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass']

= test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') DSFBUF@NFNPͷॲཧ֓ཁ
43. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass']

= test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') DSFBUF@NFNPͷॲཧ֓ཁ # ಛ௃ྔϝϞcsvϑΝΠϧ࡞੒ def create_memo(col_name, desc): file_path = Feature.dir + '/_features_memo.csv' if not os.path.isfile(file_path): with open(file_path,"w"):pass with open(file_path, 'r+') as f: lines = f.readlines() lines = [line.strip() for line in lines] # ॻ͖ࠐ΋͏ͱ͍ͯ͠Δಛ௃ྔ͕͢Ͱʹॻ͖ࠐ·Ε͍ͯͳ͍͔νΣοΫ col = [line for line in lines if line.split(',')[0] == col_name] if len(col) != 0:return writer = csv.writer(f) writer.writerow([col_name, desc])
44. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass']

= test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') DSFBUF@NFNPͷॲཧ֓ཁ # ಛ௃ྔϝϞcsvϑΝΠϧ࡞੒ def create_memo(col_name, desc): file_path = Feature.dir + '/_features_memo.csv' if not os.path.isfile(file_path): with open(file_path,"w"):pass with open(file_path, 'r+') as f: lines = f.readlines() lines = [line.strip() for line in lines] # ॻ͖ࠐ΋͏ͱ͍ͯ͠Δಛ௃ྔ͕͢Ͱʹॻ͖ࠐ·Ε͍ͯͳ͍͔νΣοΫ col = [line for line in lines if line.split(',')[0] == col_name] if len(col) != 0:return writer = csv.writer(f) writer.writerow([col_name, desc])
45. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Pclass(Feature): def create_features(self): self.train['Pclass'] = train['Pclass'] self.test['Pclass']

= test['Pclass'] create_memo('Pclass','νέοτͷΫϥεɻ1st, 2nd, 3rdͷ3छྨ') DSFBUF@NFNPͷॲཧ֓ཁ # ಛ௃ྔϝϞcsvϑΝΠϧ࡞੒ def create_memo(col_name, desc): file_path = Feature.dir + '/_features_memo.csv' if not os.path.isfile(file_path): with open(file_path,"w"):pass with open(file_path, 'r+') as f: lines = f.readlines() lines = [line.strip() for line in lines] # ॻ͖ࠐ΋͏ͱ͍ͯ͠Δಛ௃ྔ͕͢Ͱʹॻ͖ࠐ·Ε͍ͯͳ͍͔νΣοΫ col = [line for line in lines if line.split(',')[0] == col_name] if len(col) != 0:return writer = csv.writer(f) writer.writerow([col_name, desc]) \$47ܗࣜͰอଘ͓ͯ͘͠ͱ(JUIVC͔Βࢀর͠΍͍͢ ʢ΋ͪΖΜɺ&YDFM΍/VNCFSTͱ͍ͬͨΞϓϦέʔγϣϯ͔ΒͰ΋៉ྷʹݟ͑Δʣ

47. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Family_Size(Feature): def create_features(self): self.train['Family_Size'] = train['Parch'] +

train['SibSp'] self.test['Family_Size'] = test['Parch'] + test['SibSp'] create_memo('Family_Size','Ո଒ͷ૯਺') IPHFQZʹ৽͍͠ಛ௃ྔੜ੒ॲཧΛهड़ ৽͍͠ಛ௃ྔΛ࡞੒͢Δ৔߹
48. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Family_Size(Feature): def create_features(self): self.train['Family_Size'] = train['Parch'] +

train['SibSp'] self.test['Family_Size'] = test['Parch'] + test['SibSp'] create_memo('Family_Size','Ո଒ͷ૯਺') QZUIPOIPHFQZ ৽͍͠ಛ௃ྔΛ࡞੒͢Δ৔߹ IPHFQZʹ৽͍͠ಛ௃ྔੜ੒ॲཧΛهड़
49. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ class Family_Size(Feature): def create_features(self): self.train['Family_Size'] = train['Parch'] +

train['SibSp'] self.test['Family_Size'] = test['Parch'] + test['SibSp'] create_memo('Family_Size','Ո଒ͷ૯਺') QZUIPOIPHFQZ ৽͍͠ಛ௃ྔͷΈੜ੒ ৽͍͠ಛ௃ྔΛ࡞੒͢Δ৔߹ IPHFQZʹ৽͍͠ಛ௃ྔੜ੒ॲཧΛهड़
50. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ σʔλΛಡΈࠐΉࡍ͸ɺಛ௃ྔΛࢦఆͯ͠ϩʔυ͢Δ͚ͩ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size",

"cabin", "fare_mis_val_median" ] df = [pd.read_pickle(FEATURE_DIR_NAME + f’{f}_train.pkl') for f in features] df = pd.concat(df, axis=1)
51. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ σʔλΛಡΈࠐΉࡍ͸ɺಛ௃ྔΛࢦఆͯ͠ϩʔυ͢Δ͚ͩ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size",

"cabin", "fare_mis_val_median" ] df = [pd.read_pickle(FEATURE_DIR_NAME + f’{f}_train.pkl') for f in features] df = pd.concat(df, axis=1) Կ͕خ͔͔ͬͨ͠
52. ### ಛ௃ྔ؅ཧʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ w ͭͷεΫϦϓτϑΝΠϧʹಛ௃ྔੜ੒Λ·ͱΊΔ͜ͱͰɺಉ͡ܭࢉΛෳ਺ ճ࣮ߦ͢Δ͜ͱΛආ͚ɺ࣌ؒΛ༗ޮ׆༻Ͱ͖Δɻ ɹˠಛ௃ྔͷ࠶ݱੑ΋୲อɻ w ಛ௃ྔͷϝϞΛಉ࣌ʹੜ੒͢Δ͜ͱͰʮ͜ͷಛ௃ྔͳΜ͚ͩͬʁʯͱ಄Λ࢖ ΘͣʹࡁΜͩɻ w

ಛ௃ྔΛྻ͝ͱʹ؅ཧ͢Δ͜ͱͰऔΓճָ͕͠ʹͳΔɻ ɹˠQJDLMFϑΝΠϧͩͱอଘ΋ಡΈࠐΈ΋଎͍ʂ ɹˠಛ௃ྔ͕๲େʹͳΔ৔߹͸ɺ͋Δఔ౓ͷ୯ҐͰ·ͱΊͯ؅ཧ͢Δํ͕ྑ͍͔΋ɻ
53. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ˞Լهॻ੶ʹܝࡌ͞Ε͍ͯΔύΠϓϥΠϯΛࢀߟʹ͠·ͨ͠ɻ ɾ,BHHMFͰউͭσʔλ෼ੳͷٕज़ IUUQTHJIZPKQCPPL ‣ ίϚϯυҰൃͰֶशˠ4VCNJUϑΝΠϧ࡞੒·ͰΛ࣮ߦ ‣ ֶशʹ࢖༻ͨ͠ಛ௃ྔ΍Ϟσϧύϥϝʔλ͸MPHͱҰॹʹอଘ ‣

TIBQΛ༻͍ͯಛ௃ྔͷߩݙ౓ΛՄࢹԽ͠ɺ࣍ճֶश࣌ͷצॴΛݟ͚ͭΔ
54. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ SVOQZΛ࣮ߦ͢Δ͜ͱͰɺֶशɾਪ࿦ɾ4VCNJUϑΝΠϧΛ࡞੒ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size",

"cabin", "fare_mis_val_median" ] run_name = 'lgb_1102' # ࢖༻͢Δಛ௃ྔϦετͷอଘ with open(LOG_DIR_NAME + run_name + "_features.txt", 'wt') as f: for ele in features: f.write(ele+'\n') params_lgb = { 'boosting_type': 'gbdt', 'objective': 'binary', 'early_stopping_rounds': 20, 'verbose': 10, 'random_state': 99, 'num_round': 100 } # ࢖༻͢Δύϥϝʔλͷอଘ with open(LOG_DIR_NAME + run_name + "_param.txt", 'wt') as f: for key,value in sorted(params_lgb.items()): f.write(f'{key}:{value}\n') runner = Runner(run_name, ModelLGB, features, params_lgb, n_fold, name_prefix) runner.run_train_cv() # ֶश runner.run_predict_cv() # ਪ࿦ Submission.create_submission(run_name) # submit࡞੒
55. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ SVOQZΛ࣮ߦ͢Δ͜ͱͰɺֶशɾਪ࿦ɾ4VCNJUϑΝΠϧΛ࡞੒ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size",

"cabin", "fare_mis_val_median" ] run_name = 'lgb_1102' # ࢖༻͢Δಛ௃ྔϦετͷอଘ with open(LOG_DIR_NAME + run_name + "_features.txt", 'wt') as f: for ele in features: f.write(ele+'\n') params_lgb = { 'boosting_type': 'gbdt', 'objective': 'binary', 'early_stopping_rounds': 20, 'verbose': 10, 'random_state': 99, 'num_round': 100 } # ࢖༻͢Δύϥϝʔλͷอଘ with open(LOG_DIR_NAME + run_name + "_param.txt", 'wt') as f: for key,value in sorted(params_lgb.items()): f.write(f'{key}:{value}\n') runner = Runner(run_name, ModelLGB, features, params_lgb, n_fold, name_prefix) runner.run_train_cv() # ֶश runner.run_predict_cv() # ਪ࿦ Submission.create_submission(run_name) # submit࡞੒ ͜ͷSVO@OBNFΛQSFpYͱͯ͠ɺϑΝΠϧ΍ϞσϧΛอଘͯ͘͠ΕΔɻ ྫ w ࢖༻ͨ͠ಛ௃ྔϦετ w ࢖༻ͨ͠ϋΠύʔύϥϝʔλ w GPMEຖͷϞσϧ w ਪ࿦݁Ռ w TVCNJUϑΝΠϧ w TIBQͷܭࢉ݁ՌΠϝʔδϑΝΠϧͳͲ
56. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ SVOQZΛ࣮ߦ͢Δ͜ͱͰɺֶशɾਪ࿦ɾ4VCNJUϑΝΠϧΛ࡞੒ # ಛ௃ྔͷࢦఆ features = [ "age_mis_val_median", "family__size",

"cabin", "fare_mis_val_median" ] run_name = 'lgb_1102' # ࢖༻͢Δಛ௃ྔϦετͷอଘ with open(LOG_DIR_NAME + run_name + "_features.txt", 'wt') as f: for ele in features: f.write(ele+'\n') params_lgb = { 'boosting_type': 'gbdt', 'objective': 'binary', 'early_stopping_rounds': 20, 'verbose': 10, 'random_state': 99, 'num_round': 100 } # ࢖༻͢Δύϥϝʔλͷอଘ with open(LOG_DIR_NAME + run_name + "_param.txt", 'wt') as f: for key,value in sorted(params_lgb.items()): f.write(f'{key}:{value}\n') runner = Runner(run_name, ModelLGB, features, params_lgb, n_fold, name_prefix) runner.run_train_cv() # ֶश runner.run_predict_cv() # ਪ࿦ Submission.create_submission(run_name) # submit࡞੒ ੜ੒͞ΕΔϑΝΠϧྫ
57. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w

MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ
58. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w

MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ
59. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w

MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ
60. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w

MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ
61. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w

MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ
62. ### ֶशɾਪ࿦ύΠϓϥΠϯʹ͍ͭͯ ϚϚͷҰาΛࢧ͑Δ ੜ੒͞ΕΔϑΝΠϧͷྫʢϑΥϧμ͸దٓ෼͚͍ͯ·͢ʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w MHC@@GPMENPEFMʢGPMEͰ࡞੒͞ΕͨϞσϧʣ w

MHC@@QSFEQLMʢUFTUσʔλͰͷਪ࿦݁Ռʣ w MHC@@@TVCNJTTJPODTWʢਪ࿦݁ՌΛLBHHMFʹఏग़Ͱ͖ΔDTWʹม׵ͨ͠΋ͷʣ w MHC@@@GFBUVSFTUYUʢࠓճͷֶशʹ࢖༻ͨ͠ಛ௃ྔϦετʣ w MHC@@@QBSBNUYUʢࠓճͷֶशʹ࢖༻ͨ͠ϋΠύʔύϥϝʔλʣ w MHC@@@TIBQQOHʢTIBQͰܭࢉͨ͠ՄࢹԽΠϝʔδʣ w HFOFSBMMPHʢܭࢉϩάϑΝΠϧʣ w SFTVMUMPHʢϞσϧͷείΞ͚͕ͩهࡌ͞ΕͨϩάϑΝΠϧʣ Կ͕خ͔͔ͬͨ͠

65. ### ·ͱΊ ϚϚͷҰาΛࢧ͑Δ w ಛ௃ྔ؅ཧ͍͍ͧʂ  ͭͷεΫϦϓτϑΝΠϧʹಛ௃ྔੜ੒Λ·ͱΊΔ͜ͱͰɺಉ͡ܭࢉΛෳ਺ճ࣮ߦ͢Δ͜ͱ ΛճආͰ͖Δʂ  ಛ௃ྔͷϝϞΛಉ࣌ʹੜ੒͢Δ͜ͱͰʮ͜ͷಛ௃ྔͳΜ͚ͩͬʁʯͱ಄Λ࢖͏ճ਺͕ݮ Δʂ

 ಛ௃ྔΛྻ͝ͱʹ؅ཧ͢Δ͜ͱͰऔΓճָ͕͠ʹͳͬͨʂʢ͕ɺಛ௃ྔ͕๲େͳ৔߹͸͋ Δఔ౓ͷ·ͱ·ΓͰ؅ཧͨ͠ํ͕ྑ͍͔΋ʣ w ύΠϓϥΠϯ͍͍ͧʂ  ύΠϓϥΠϯΛߏங͢Δ͜ͱͰɺߴ଎ͳ1%\$"Λ࣮ݱʂ  ֶशʹ࢖༻ͨ͠ಛ௃ྔͱύϥϝʔλΛ؅ཧ͢Δ͜ͱͰɺ࠶ݱੑ΋୲อ͞Ε৺ཧత҆શੑ΋